Macro shot of a high-performance computer processor - Tech engineering background

"Production-ready" gets thrown around a lot, but what does it actually mean? It's not a single thing; it's a constellation of properties that together make an application safe to run in front of real users, at real scale, for real money. Security that holds up under attack. Infrastructure that stays up when things go wrong. A deployment process that doesn't require a developer to manually push files and hope. Monitoring that tells you about problems before your users do.

The journey from working prototype to production-grade system is well-understood. It has predictable stages, predictable effort, and predictable outcomes when you know what you're doing. Here's how it works.

What "Production-Ready" Actually Means

Before diving into the steps, it's worth defining the destination. A production-ready application has:

  • Security: no exposed secrets, hardened authentication and authorisation, input validation, dependency hygiene
  • Reliability: the system stays up; failures are isolated and recoverable; there are backups
  • Observability: you know what's happening inside the system at all times, without having to ask users
  • Deployability: code can be tested, reviewed, deployed, and rolled back in a controlled, repeatable way
  • Scalability: the system handles growth without requiring emergency intervention
  • Compliance: where applicable, meets the legal and regulatory requirements of its domain and jurisdiction

None of these are binary — each exists on a spectrum, and what's "good enough" depends on the application. A low-stakes internal tool has different requirements than a healthcare platform or a fintech service. The steps below are universal; the depth of implementation should be calibrated to risk.

1 REVIEW 2 SECURITY 3 INFRA 4 CI/CD 5 MONITOR 6 SHIP
Fig 1: The Six Stages of Production Hardening

The Six Stages

1
Code Review & Architectural Assessment
Typical duration: 1–3 days

Before changing anything, understand what you have. A thorough review of a vibe-coded application covers the data model and its integrity constraints, authentication and authorisation flows, API design and input handling, dependency versions and known vulnerabilities, and the overall architecture, including whether it will scale, where the single points of failure are, and what the technical debt looks like.

This stage produces a prioritised list of issues: critical security findings that block launch, important hardening tasks to complete before significant user growth, and lower-priority improvements that can wait. It gives everyone a shared, honest picture of where the application actually stands.

Common finding: API keys in frontend code, committed to version control, or in .env files that were accidentally pushed. Rotate these immediately — they should be considered compromised.
2
Security Hardening
Typical duration: 3–10 days depending on findings

Address the findings from Stage 1, working from highest to lowest severity. Critical security work typically includes:

  • Moving all secrets to a proper secret management system (environment variables, AWS Secrets Manager, Vault)
  • Auditing and fixing authorisation checks to ensure every API endpoint verifies not just that the user is authenticated, but that they're authorised to access the specific resource
  • Adding rate limiting to authentication endpoints and other sensitive operations
  • Replacing any string-concatenated SQL queries with parameterised queries
  • Adding proper input validation and output encoding
  • Implementing security headers (CSP, HSTS, X-Frame-Options)
  • Setting up dependency vulnerability scanning
Common mistake: Treating authentication and authorisation as the same problem. Authentication (are you who you say you are?) is usually implemented correctly. Authorisation (are you allowed to access this specific thing?) is where the holes are most often found.
3
Infrastructure Setup
Typical duration: 2–5 days

Move from "running on a server somewhere" to a properly designed, infrastructure-as-code-managed environment. This means separate staging and production environments, automated backups with tested restore procedures, appropriate compute and database sizing, CDN configuration for static assets, and DNS and SSL certificate management.

For most vibe-coded applications, the right infrastructure is simpler than people assume: a managed application platform (Railway, Render, Fly.io, or AWS ECS), a managed database (RDS, PlanetScale, Supabase), and a CDN. The goal is managed services that reduce operational burden, not a bespoke Kubernetes cluster that requires a dedicated DevOps engineer to maintain.

Everything should be defined as code (Terraform, Pulumi, or CloudFormation) so that infrastructure is reproducible, auditable, and can be rebuilt from scratch if necessary.

Common mistake: Building production infrastructure before having a tested backup and restore process. The infrastructure that matters most is the one you use when something goes catastrophically wrong.
4
CI/CD Pipeline
Typical duration: 2–4 days
COMMIT TEST BUILD DEPLOY

Replace manual deployments with an automated pipeline that runs on every code change. A minimal, effective CI/CD pipeline includes: running automated tests on every pull request, blocking merges when tests fail, automated deployment to staging on merge to the main branch, a manual promotion gate before production deployment, and automated rollback capability if a deployment fails health checks.

GitHub Actions, GitLab CI, and CircleCI are all solid choices. The specific tool matters less than the discipline: nothing goes to production without passing the pipeline. No exceptions for "just a small change".

This is also the stage to establish a branching strategy, a pull request review process, and a deployment approval workflow. These aren't bureaucracy for its own sake; they're the mechanisms that keep accidental changes from reaching production and ensure that at least one other set of eyes has seen every change.

5
Monitoring & Alerting
Typical duration: 1–3 days

You cannot fix what you cannot see. Production monitoring should cover uptime checks (external monitoring that simulates a user accessing your service every minute), error tracking (Sentry, Datadog, or similar, capturing every exception with full context), performance monitoring (p50/p95/p99 latency for critical endpoints), and business metrics (user signups, key actions, revenue events, or whatever matters to the health of the product).

Alerting should be calibrated carefully. Too many alerts means alerts get ignored; too few means you miss real problems. Start with: page on downtime (immediately), alert on elevated error rates (within minutes), warn on performance degradation. Review and tune over the first few weeks in production.

Common gap: Monitoring that tells you the server is up but not whether the application is working correctly. A server can be running and accepting requests while serving error pages to 100% of users. Monitor what users experience, not just what the server reports.
6
Documentation & Compliance
Typical duration: 2–5 days (ongoing)

Documentation serves two audiences: the engineering team (who needs to understand and maintain the system) and compliance/legal (who needs evidence of controls). Both are important, and the work overlaps more than people expect.

Technical documentation should cover: architecture diagrams, deployment and rollback procedures, incident response runbooks, and on-call responsibilities. For applications handling personal data, compliance documentation includes data processing records (GDPR Article 30), a privacy policy and terms of service, a data breach response procedure, and if applicable, a Data Protection Impact Assessment.

For teams pursuing SOC 2 or ISO 27001, this stage is the beginning of a longer programme — but starting with the documentation discipline early makes certification significantly less painful when you get there.

When to DIY vs When to Get Help

The honest answer is: it depends on whether your team has done it before. Each of these stages is learnable, and if you have the time and the interest, doing it yourself builds genuine capability.

The cases where getting specialist help makes clear financial sense: when you have a time-to-market constraint that doesn't allow for the learning curve, when compliance requirements mean mistakes have regulatory consequences, when the security risk surface is high (user financial data, healthcare data, enterprise customers), or when your team's time is more valuable spent on product than on infrastructure.

A well-run production hardening engagement takes two to four weeks and leaves you with a system that's genuinely production-grade, documented, and understood by your team. That's usually faster and cheaper than the alternative — which is discovering the gaps one at a time, under pressure, after they've caused real problems.