Back to Blog

Refactoring Legacy Code: The Strangler Pattern vs. Big Bang Rewrite

Refactoring Legacy Code: The Strangler Pattern vs. Big Bang Rewrite

Published: 05/2026 | Reading Time: 4 minutes | Category: Engineering Practice

---

Most teams that inherit a legacy codebase eventually hit a moment where someone says "we should just rewrite this." Sometimes that is the right call. More often it is a reaction to pain rather than a strategy for fixing it.

The three real options are incremental refactoring, the strangler pattern, and a big bang rewrite. Each one fits a different situation, and choosing wrong is expensive in ways that are hard to recover from.

Incremental Refactoring: Low Risk, Slow Progress

Incremental refactoring means touching code as you go. You are working in a module, you clean it up, you move on. The Boy Scout Rule applied systematically: leave it better than you found it.

This works well when the architecture is sound and the problems are localized. It does not work when the architecture itself is the problem. If you are incrementally refactoring a system where every layer is tangled into every other layer, you are rearranging furniture in a house with a bad foundation. The work feels productive. The structural problem does not move.

The other limitation is timeline. Incremental refactoring does not give you a moment where you can say "that problem is solved." It is continuous improvement, not remediation.

The Strangler Pattern: Controlled Replacement

The strangler pattern is the most frequently useful approach for systems that are working but painful. The core idea: you do not touch the legacy system until you have built a replacement for a specific piece of it. Then you route traffic to the new piece. Then you delete the old piece. You repeat until the legacy system is gone.

The process has four practical steps.

First, identify seams. A seam is a place where you can draw a boundary between one piece of functionality and another without tearing everything apart. API endpoints are natural seams. Background jobs are natural seams. A monolith where every function calls every other function has no natural seams, which is information worth knowing before you start.

Second, write characterization tests before you touch anything. A characterization test does not verify that the code does what it should do. It verifies what the code actually does right now, including the bugs. This is your safety net. If your new implementation produces different output than the old one, the test tells you.

Third, build the replacement in isolation. Do not modify the old code to make the new code easier to write. The new code should work regardless of what the old code does.

Fourth, route and validate in parallel. Run both implementations on real traffic. Compare outputs. When confidence is high enough, cut over fully. Feature flags make this manageable. They also give you a rollback path if something unexpected shows up in production.

The strangler pattern works because it never puts your system in a broken intermediate state. You are always running something that works. The tradeoff is that it takes longer than a rewrite and requires more discipline to execute without shortcuts.

Big Bang Rewrite: When It Is Actually Justified

There are situations where a strangler approach is not viable. Technology obsolescence is the clearest case. If the language runtime is end-of-life, if the framework has no upgrade path, if key dependencies have been abandoned, you cannot incrementally replace a system that cannot be safely run while you build around it.

The other case is architectural impossibility. Some systems are so deeply coupled that there are no seams to work from. Building a strangler wrapper around a system like that means building a second version of the entire system anyway. At that point the fiction of incremental replacement does not reduce risk. It just extends the timeline.

When a rewrite is genuinely the right call, the risk reduction comes from the same tools: feature flags for progressive rollout, parallel running to catch behavioral differences, and a concrete rollback plan before you flip the switch. The rewrite itself does not eliminate risk. The operational approach around it does.

One thing that consistently separates successful rewrites from failed ones: teams that take the time to understand why the old system works the way it does before they build the new one. The quirks and workarounds in legacy code are usually there for a reason. Sometimes the reason is bad decisions made years ago. Sometimes the reason is a business rule that nobody documented. Assuming it is the former and then finding out it was the latter is a painful way to learn the difference.

Measuring Whether It Was Worth It

Refactoring ROI is hard to measure cleanly because the costs are visible and the benefits are not. Deployment frequency, mean time to recovery, and the time developers spend reading code versus writing code are reasonable proxies. None of them are perfect.

What tends to be more reliable is tracking the specific problems that motivated the refactoring work. If the goal was to reduce the time required to add a new integration, measure that before and after. If the goal was to reduce the blast radius of a given component failure, look at how outages have behaved. Vague goals produce vague results.

Before committing to a refactoring approach, writing down what you are trying to achieve and how you will know whether it worked is worth the thirty minutes it takes.

---

Tags: #LegacyCode #Refactoring #SoftwareArchitecture #TechnicalDebt #EngineeringPractice

Related Articles: