~/articles/microservices-vs-monolith

◆◆Intermediate

Microservices vs Monolith — Honestly

When microservices help, when they hurt, and the engineering and organizational realities behind the choice.

10 min read2026-02-21Ironclad Academy

#architecture #microservices #engineering-org #conway's-law

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The internet is full of "microservices vs monolith" arguments. Most are religious. The honest answer is that either can work, either can fail, and the choice mostly depends on your engineering org's size and discipline, not the technology.

What's a monolith?

One codebase, one deployable, one process. All features live together. They share a database, an authentication context, and a deployment pipeline.

Examples: most early-stage startups, GitHub for many years, Shopify (still), Basecamp (proudly).

flowchart TD
    LB[Load Balancer] --> M1[Monolith]
    LB --> M2[Monolith]
    LB --> M3[Monolith]
    M1 --> DB[(Shared DB)]
    M2 --> DB
    M3 --> DB
    style M1 fill:#ff6b1a,color:#0a0a0f
    style DB fill:#15803d,color:#fff

What's a microservice?

A bounded context, owned by a small team, deployed independently, with its own data store and API. Different services talk over the network.

Examples: Netflix, Uber, Amazon (since Bezos's 2002 API mandate), modern Spotify.

flowchart TD
    LB[API Gateway] --> US[Users]
    LB --> PS[Posts]
    LB --> NS[Notifications]
    LB --> SS[Search]
    LB --> RS[Recs]
    US --> UDB[(Users DB)]
    PS --> PDB[(Posts DB)]
    NS --> NDB[(Notif DB)]
    SS --> SDB[(Search Index)]
    style LB fill:#ff6b1a,color:#0a0a0f

What microservices buy you — and what they cost

The benefits aren't technical performance or cleaner code. They're organizational.

Independent deploys are the biggest one. With 100 engineers on a monolith, a production deploy becomes a calendar event requiring coordination across every team. With 100 engineers spread across 30 services, each team ships hourly on its own schedule. The blast radius matters too: a bug in the recommendation service can't take down checkout if they're isolated. And when your search cluster needs 100 boxes while your auth service needs 3, independent scaling lets you provision each appropriately rather than forcing the most resource-hungry profile onto every replica.

The underlying reason this works is Conway's Law — Melvin Conway's 1968 observation that organizations produce systems that mirror their communication structure. If you have 10 teams, you'll end up with 10 subsystems whether you plan it or not. Microservices make that fact explicit and deliberate: align service boundaries with team boundaries so the system structure reinforces the org structure rather than fighting it. When service and team are misaligned, you get the distributed monolith — the code is split, but the coordination is not.

Now, the costs. Every in-process function call that used to take nanoseconds becomes a network round-trip taking 0.5–5ms — roughly 10,000× worse. Failure modes multiply too: timeouts, partial failures, retries with exponential backoff, cascading slowdowns. A user request touching 10 services eats 10 network hops from its latency budget.

flowchart LR
    subgraph "Monolith: one process"
        A[Handler] -->|"~50ns"| B[Auth module]
        B -->|"~50ns"| C[Billing module]
        C -->|"~50ns"| D[Notif module]
    end
    subgraph "Microservices: network calls"
        E[Handler] -->|"~0.5–5ms"| F[Auth service]
        F -->|"~0.5–5ms"| G[Billing service]
        G -->|"~0.5–5ms"| H[Notif service]
    end
    style A fill:#0e7490,color:#fff
    style E fill:#ff6b1a,color:#0a0a0f

Data integrity is the other sharp edge. In a monolith, "update the user and record the audit log" is a single BEGIN TRANSACTION ... COMMIT. Across services, you need sagas, two-phase commit, or best-effort reconciliation — none of which is free. Each service owns its own tables, so there are no foreign keys across services, no joins, and no single consistent snapshot of the world. That's manageable once you understand it, but teams that migrate without internalizing it often end up with subtle consistency bugs that are painful to diagnose.

Operationally: 30 services × 3 environments × deployment pipelines × monitoring × distributed tracing × on-call rotations = a substantial tooling investment before you even write business logic. You also need service-to-service authentication (mTLS, service tokens, or a service mesh), and a versioning and deprecation strategy so Service A bumping an API doesn't silently break Service B in production.

When monolith is the right answer

Under 25 engineers all working in similar domains, the coordination cost of microservices doesn't come close to outweighing their benefits. If data consistency is critical — banking, healthcare, any domain where you really want ACID guarantees spanning multiple entities — a monolith with a real database is dramatically simpler than sagas and eventual consistency. And if you're pre-product-market-fit, you don't yet know where the seams are. Cutting early produces wrong cuts: you'll spend more time re-aligning boundaries than you saved from independence.

The modular monolith deserves special mention here. Enforce service-like boundaries via package structure and code ownership conventions, but deploy as one unit. You get most of the bounded-context benefits — clear ownership, isolated data access paths, legible API contracts — without any of the network costs. When the day comes to extract a service, the module is already waiting with clean interfaces.

When microservices are the right answer

The threshold is roughly 100 engineers across multiple product surfaces with multiple teams regularly colliding in the codebase. At that scale, the coordination overhead of a shared deploy dominates. Other signals: heterogeneous scaling needs where image processing dwarfs auth in CPU cost; genuinely different release cadences across teams; and different reliability targets, where payments demands 99.99% uptime while recommendation is fine at 99.9%.

The middle path

Most successful companies land somewhere in between. The two most common shapes:

Modular monolith with extracted services. Start monolithic. Extract a service when it has a meaningfully different scaling profile, when a separate team now owns it and the API contract has stabilized, or when it has distinct reliability requirements. The result is one substantial monolith plus 5–10 surrounding services — each extracted for a specific reason, not because "microservices are good."

Service-oriented, not micro. Coarse-grained services aligned with teams. Not 100 microservices; 10–15 services sized to match your org. Amazon's original decomposition looked like this before the tooling matured enough to manage finer granularity.

The distributed monolith (the anti-pattern)

This is the failure mode worth knowing by name. Many services, but they're all deployed together, share a database, share a build, and require coordinated changes. It's the worst of both worlds — you've paid the operational cost of microservices without gaining any of the independence.

You can recognize it by four symptoms: shared database (Service A and B both write to the same users table, so adding a column requires coordinating both teams); synchronized deploys (releasing Service A requires deploying Service B in the same window because of an unversioned shared contract); temporal coupling (A calls B synchronously, so when B slows down, A slows down, and when B goes down, A goes down — no isolation at all); and chatty inter-service calls (A calls B twelve times to render a single page, so network overhead swamps any organizational benefit).

The usual cause is premature decomposition: someone split the codebase before understanding the boundaries, and the cuts went across natural seams rather than along them. The fix is to merge the services back, stabilize the boundaries inside a modular monolith, then re-extract along clean seams.

How to extract a service from a monolith

This is the Strangler Fig pattern, named by Martin Fowler: grow the new service around the old code until the old code can be removed. The safe playbook has five steps, and each is reversible.

flowchart TD
    S1["Step 1: Create a module inside the monolith\n(same DB, module only touches its own tables)"] --> S2["Step 2: Monolith calls the module's API in-process"]
    S2 --> S3["Step 3: Deploy module as a separate service\n(same DB, calls go over the network)"]
    S3 --> S4["Step 4: Move tables to the new service's own DB"]
    S4 --> S5["Step 5: Independently scaled and deployed"]
    style S1 fill:#0e7490,color:#fff
    style S3 fill:#ffaa00,color:#0a0a0f
    style S5 fill:#15803d,color:#fff

The key discipline: define the API clearly before doing anything else — gRPC or HTTP, documented contract. Then move one step at a time. Steps 1 through 3 keep data in the shared database, so you can roll back easily. Step 4 is the hard one because moving tables requires a data migration and usually a dual-write period. Don't try to do all five steps in one quarter.

How real big-tech companies do it

Amazon has operated as microservices since Bezos's "API mandate" in 2002 — every capability exposed as a network API, no shared memory or direct DB access between teams. The result is hundreds to thousands of services.

Netflix decomposed after a major database outage in August 2008 forced a migration to AWS and a full architecture rethink. The migration began in 2009 and took roughly seven years to complete; Netflix now runs 1,000+ services and is the canonical origin of much open-source microservices tooling (Hystrix, Eureka, Zuul).

Google uses microservices internally via gRPC (the open-source successor to their internal Stubby RPC) and Borg (the predecessor to Kubernetes), but their services tend to be coarser-grained than Amazon's.

Meta maintains a monorepo containing most company code. The core Facebook product runs as a large PHP service. What looks like "services" at Meta is mostly shared libraries within the monorepo rather than independently-deployed microservices.

Shopify runs what they affectionately call the "majestic monolith" — a famously productive Rails application with extracted services for specific needs.

Stripe started as a Ruby monolith and evolved into a service-oriented architecture where business flows like "charge a card" traverse a graph of services connected via an event bus — while the public API surface is deliberately unified to feel like a single product.

There is no universal right answer. Every one of these companies picked something that fit their org and their problem at the time they made the decision.

A few persistent myths

"Microservices scale better." They don't, inherently. A horizontally-scaled monolith handles billions of requests per day routinely — Shopify's is a famous example. Scaling is about capacity planning and caching, not architecture label.

"Microservices are more reliable." If anything, they introduce more failure modes: network partitions, partial failures, version skew between services. Reliability comes from engineering investment, not from how you package the code.

"Microservices let teams move faster." Sometimes, eventually. In the early days of a migration they often slow teams down because the tooling and conventions aren't yet in place. Speed comes after the org and platform are mature.

"You should start with microservices." Almost never. Premature decomposition is far harder to fix than a well-structured monolith. You don't know where the right boundaries are until you've built and shipped something real. Start as a monolith, extract when you must.

Things to discuss in an interview

"What's the trade-off between microservices and a monolith?" → answer with org size, deployment independence, latency per hop, and data integrity constraints.
"When would you extract a service?" → different scaling profile, separate team ownership, stable API contract.
"How would you migrate from monolith to microservices?" → Strangler Fig / 5-step extraction; each step reversible.
"What's a distributed monolith?" → the anti-pattern; recognize the four symptoms (shared DB, synchronized deploys, temporal coupling, chatty calls) and explain why it's worse than either pure approach.
"Why is architecture an org decision?" → Conway's Law: systems mirror the communication structure of the org that builds them.

Things you should now be able to answer

Why are microservices an organizational choice, not just technical?
What does Conway's Law predict about your system architecture?
What's a modular monolith and when is it preferable?
What is the actual latency cost of a service-to-service call vs an in-process function call (order of magnitude)?
What are the four symptoms of a distributed monolith, and why is it the worst of both worlds?
Why does early decomposition often go wrong?
What's the Strangler Fig pattern, and why is it the safest way to extract a service?

Frequently asked questions

▸What is a distributed monolith and why is it considered the worst of both worlds?

A distributed monolith is many services that are still deployed together, share a database, share a build, and require coordinated changes. You recognize it by four symptoms: shared database across services, synchronized deploys, temporal coupling (when B slows, A slows), and chatty inter-service calls. It carries the full operational cost of microservices while delivering none of the deployment independence.

▸What is the latency cost of a microservices network call compared to an in-process function call?

An in-process call takes roughly 50 nanoseconds; a network round-trip between services takes 0.5 to 5 milliseconds — approximately 10,000 times worse per hop. A user request that touches 10 services therefore consumes 10 network hops from its latency budget.

▸When should you choose a monolith over microservices?

Under 25 engineers working in similar domains, the coordination cost of microservices does not outweigh the benefits. Monolith is also the right answer when strong data consistency is required (banking, healthcare), or when you are pre-product-market-fit and do not yet know where the correct service boundaries are.

▸What is Conway's Law and why does it matter for architecture decisions?

Conway's Law is Melvin Conway's 1968 observation that organizations produce systems that mirror their communication structure. Microservices make this explicit by aligning service boundaries with team boundaries; when service and team boundaries are misaligned, the result is the distributed monolith — the code is split but the coordination is not.

▸What is the Strangler Fig pattern and how does service extraction work safely?

The Strangler Fig pattern, named by Martin Fowler, grows a new service around existing monolith code in five reversible steps: create a module inside the monolith using the shared database, have the monolith call that module in-process, deploy the module as a separate service still on the shared database, migrate the tables to the new service's own database, then scale and deploy independently. Steps 1 through 3 keep data in the shared database so rollback is straightforward; step 4, the data migration, is the hard one and should not be rushed into the same quarter as the earlier steps.

← previous

SQL vs NoSQL — How to Actually Choose

Design a Distributed Key-Value Store (Dynamo)

// RELATED