How I use Claude Code to refactor a legacy codebase — a complete workflow

javascript dev.to

How I use Claude Code to refactor a legacy codebase — a complete workflow

Every developer has that codebase. The one inherited from a previous team, or written three years ago when you "didn't know better." The one where touching one thing breaks five others.

I've been using Claude Code to systematically refactor legacy codebases, and I want to share the exact workflow that's saved me hours of painful manual archaeology.

The problem with legacy refactoring

Legacy refactoring sessions are long. Really long. You're context-switching between:

  • Understanding what the code actually does (vs what the docs say)
  • Writing characterization tests to lock in current behavior
  • Identifying the strangler fig entry points
  • Actually making the changes
  • Verifying nothing broke

These sessions can easily run 4-6 hours. If you're paying by the token or hitting rate limits mid-session, you lose all your context right when the AI has finally understood the messy domain model.

My workflow: the 5-phase approach

Phase 1: Archaeology (20-30 min)

Start by giving Claude Code a map of the disaster:

Read through src/legacy/ and tell me:
1. What does this module actually do? (not what the README says)
2. What are the data flows — what comes in, what goes out?
3. What are the most dangerous parts to touch?
4. What are the hidden dependencies I might miss?
Enter fullscreen mode Exit fullscreen mode

This produces a "reality map" — often revealing that the README is 2 years out of date.

Phase 2: Characterization tests (45-60 min)

Before touching a line of production code:

Now write characterization tests for the UserProcessor class.
Don't test what it SHOULD do — test what it ACTUALLY does right now.
Include the weird edge cases. Include the behavior that looks like bugs 
but might be intentional. I need these tests to catch regressions.
Enter fullscreen mode Exit fullscreen mode

Characterization tests are your safety net. They capture current behavior exactly — even the bugs — so you know if you accidentally change something.

Example output:

describe('UserProcessor (characterization)', () => {
  it('returns null for users with no email (not an error, just null)', async () => {
    const result = await processor.process({ id: 1, email: null });
    expect(result).toBeNull(); // Yes, this is weird. Don't change it.
  });

  it('adds 24 hours to timestamps in UTC but displays in Pacific', async () => {
    // This is almost certainly a bug but 3 other systems depend on it
    const result = await processor.process(testUser);
    expect(result.displayTime).toBe('2024-01-02T08:00:00'); // Not UTC
  });
});
Enter fullscreen mode Exit fullscreen mode

Phase 3: Strangler fig planning (15 min)

Based on the characterization tests, design a strangler fig refactor.
Give me:
1. The new interface this module should expose
2. A migration path that lets old and new code coexist
3. The order of operations — what to refactor first, what last
4. Which tests need to change vs which are permanent safety nets
Enter fullscreen mode Exit fullscreen mode

The strangler fig pattern means you never have a "big bang" refactor. Old code continues working while new code grows around it.

Phase 4: Incremental refactoring (the bulk of the work)

Now you execute the plan — but incrementally:

Refactor step 1: Extract the email validation logic into EmailValidator class.
Keep the old behavior exactly. All characterization tests must still pass.
Add new unit tests for the extracted class.
Enter fullscreen mode Exit fullscreen mode

Run tests after every step. Never refactor more than one thing at a time.

npm test -- --watch src/legacy/
Enter fullscreen mode Exit fullscreen mode

Phase 5: Verification and documentation

Now that we've completed the refactor:
1. Update the README to reflect what the module actually does now
2. Document the intentional quirks we preserved
3. Flag the technical debt we chose NOT to fix and why
4. Write a migration guide for anyone else touching this code
Enter fullscreen mode Exit fullscreen mode

The rate limit problem

Here's the painful reality: a full legacy refactor session will likely exhaust your AI's context or hit rate limits. Right when Claude has internalized the messy domain model and understands why that weird timestamp behavior exists, the session ends.

I switched to SimplyLouie for exactly this reason. Flat ✌️2/month with no token counting, no rate limits per session. Legacy refactoring requires long, uninterrupted conversations — you can't afford to lose your AI's understanding of the codebase halfway through.

The results

Using this workflow on a 15,000-line Node.js codebase:

  • Week 1: Characterization tests written, 0 regressions
  • Week 2: Core business logic extracted into clean modules
  • Week 3: Old code removed, test coverage went from 12% to 67%
  • Total time: ~18 hours of Claude Code sessions vs estimated 60+ hours manual

The characterization tests alone saved us from 3 production incidents during the refactor.

Prompts to save

Here's my legacy refactoring prompt kit:

Archaeology prompt:

Read [file/directory] and give me a reality map: what it does, data flows, 
dangers, and hidden dependencies. Ignore the documentation — tell me what 
the code actually does.
Enter fullscreen mode Exit fullscreen mode

Characterization test prompt:

Write characterization tests for [class/module]. Capture actual behavior, 
not intended behavior. Include weird edge cases. Mark anything that looks 
like a bug but might be intentional.
Enter fullscreen mode Exit fullscreen mode

Strangler fig prompt:

Design a strangler fig refactor for [module]. New interface, migration path, 
order of operations, which characterization tests are permanent safety nets.
Enter fullscreen mode Exit fullscreen mode

Step prompt:

Execute refactor step [N]: [specific thing]. Preserve exact behavior. 
All characterization tests must pass. Add unit tests for new code only.
Enter fullscreen mode Exit fullscreen mode

The legacy codebase is survivable. The characterization tests give you courage to change things. The strangler fig gives you a path. And Claude Code gives you a pair programmer who can hold the whole mess in context.

Using SimplyLouie for flat-rate Claude access — ✌️2/month, no rate limits. Long sessions for long refactors.

Source: dev.to

arrow_back Back to Tutorials