Migrating Between API Versions and Providers

The two kinds of migration

Two scenarios that get conflated but need different approaches:

Version migration within the same provider. The API you depend on is releasing v2; v1 will be deprecated. The semantics are mostly compatible; the wire format and some endpoint shapes change.
Provider migration. You're switching from one vendor's API to another's. Same problem domain, different abstractions. Mappings between the two are partial; some features don't translate.

Version migration is mechanical: read the changelog, update the wire format, ship. Provider migration is architectural: it touches your domain model, your operational tooling, often your contracts. The patterns below work for both, but the second case takes orders of magnitude longer.

Before the migration: understand what you actually use

The first concrete step is an audit: which endpoints does your integration actually call, with what parameters, returning what fields you actually use? The audit answers questions the migration depends on:

Which endpoints have a clean equivalent in the target?
Which endpoints don't, and how will you handle the gap?
Which response fields does your code actually read? (Often a fraction of what's returned.)
Which calls are read-only vs which mutate state? Mutations need more care during migration.

Tools that help: trace logs from a representative production window, the type definitions in your client wrapper, search across the codebase for the field names. The audit usually surfaces dependencies the team didn't know about — endpoints that look unused but get called by a quarterly report, fields that look unused but are surfaced in error messages.

The strangler pattern

The default pattern for any non-trivial migration: incrementally route traffic from the old to the new, endpoint by endpoint, while both run in parallel. Named after the strangler fig — a vine that grows around a tree until the tree dies and the vine remains.

The mechanics:

Build an abstraction layer in your code that the rest of the application calls. Initially, all calls go to the old API.
Implement one endpoint against the new API. Add a flag (per-endpoint, per-customer, per-feature) that routes that endpoint's calls to the new implementation.
Roll the flag out gradually — internal users first, then a small percentage, then larger. Watch metrics and error rates.
When confidence is high, default the endpoint to the new implementation; remove the old code path.
Repeat for the next endpoint.

The pattern's value is that you can stop at any point. If the new API has a serious problem on endpoint #4, your endpoints #1-3 are already migrated and stable; you don't have to roll back everything.

Dual writes and read-shadowing

For mutations, the strangler pattern alone leaves an awkward gap: while you're routing some traffic to the new API and some to the old, two sources of truth diverge. Two patterns address this:

Dual writes

Every mutation goes to both the old and the new API. Reads come from one (typically the old until you cut over). This keeps both systems consistent during the migration. The cost is operational: every write doubles in cost and now has two failure modes. The defense is to make one side authoritative and the other best-effort — log discrepancies but don't fail the request because the secondary write failed.

Read-shadowing

Every read also goes to both APIs. The response from the authoritative API is returned to the caller; the response from the new API is compared and any differences are logged. This finds bugs — fields that don't match, errors the new API returns where the old one didn't — without exposing them to users.

Shadowing is most valuable for read-heavy integrations because it surfaces parity issues at production volume without production risk. The cost is per-request overhead and double the call volume to the upstream APIs; rate-limit the shadow side or sample to keep it tractable.

Mapping differences

For provider migrations, the mapping between old and new is rarely 1:1. Common patterns:

Field rename. The old API returned created, the new one returns created_at. Trivial — handle in the client wrapper.
Field split. The old API returned name as a single field, the new one returns first_name + last_name. The wrapper joins them on read; on write, the application has to provide both.
Different enum values. The old API returned "active" / "inactive", the new one returns "enabled" / "disabled" / "suspended". The wrapper translates, with a default for the value that doesn't have a clean equivalent.
Different pagination. The old API was offset-paginated; the new one is cursor-paginated. The application's iteration logic needs to change, not just the wrapper.
Different error shapes. The old API returned errors as {"error": "msg"}; the new one uses RFC 7807 problem details. Normalize at the wrapper layer to a consistent internal shape.
Missing functionality. The old API supports an operation the new one doesn't. Either build it on top of the new API's primitives, change your domain model, or accept the gap as a known limitation.

The pattern: the wrapper layer is responsible for normalizing both APIs to a single internal shape. The application code never sees the raw responses — it sees the normalized form. This is the core of why thin client wrappers are worth building before you need them.

Traffic shifting

The mechanics of moving traffic from old to new:

Per-feature flags

The simplest pattern: a boolean flag per endpoint or per feature, controlling which API the call goes to. Easy to roll out and roll back. Doesn't let you do percentage-based or per-user rollouts without additional logic.

Percentage rollouts

For each call, a hash function (typically of the user ID) determines whether this call goes to the new or old API. Setting the rollout to 10% means 10% of users see the new API consistently; 50% means half of users; 100% means the migration is done. The hash on user ID rather than per-call ensures a given user has a stable experience.

Canary by customer

Start with internal users, then friendly customers, then larger ones. Provider migrations especially benefit from this — different customers exercise different parts of the API surface, and bugs concentrated in some endpoints get found by the customers who use them.

Geographic rollout

For APIs that vary by region (different latency, different data residency, sometimes different bugs), roll out region by region. Smaller regions first, primary regions last.

Watching the migration

The metrics that matter during a migration:

Error rate per side. If the new API is erroring more than the old one, the rollout pauses or rolls back.
Latency per side. Significantly slower can be acceptable temporarily but needs to be tracked. If the new API is 5× slower, the migration is going to need optimization work.
Functional parity (from shadowing). Discrepancies between old and new responses; rates of "the new API returned data that's missing fields we expected".
Customer-reported issues. Even with shadowing, some bugs only surface as user complaints. Tag and prioritize them by which side of the migration the user was on.

The rollout decision rule: only advance the percentage if the new side's error rate, latency, and parity are within acceptable bounds. The bounds need to be set explicitly before the rollout, not negotiated under pressure mid-migration.

The cutover

At some point the new API is at 100% and the old API is unused. Two more steps before the migration is really done:

Stop calling the old API. Remove the dual writes, remove the shadowing, remove the old code path. Keep the old code path for one more cycle if you want a safety net, but commit to removing it.
Decommission the old credentials and connections. Cancel the old API contract, rotate any shared secrets, remove the old hostnames from your DNS allow-lists. Otherwise the old credentials live forever as a forgotten attack surface.

The migration is not done when the new API serves 100% of traffic. It's done when the old API is no longer accessible from your system.

Rollback strategy

You will sometimes need to roll back. The plan needs to exist before the rollout, not be invented during the incident.

Two questions to answer:

How fast can you reverse the traffic shift? If the rollback is "deploy a new version", that's minutes. If the rollback is "wait for the percentage flag to take effect across the fleet", that's seconds. Faster is better — you'll want it during incidents.
What happens to in-flight state? If the new API created records, do they exist on the old side? If you dual-wrote, yes; otherwise, no. The rollback might leave data on the new side that the old side doesn't know about. Have a plan: backfill, accept the gap, or block rollback after a certain point.

Document the rollback procedure as part of the migration plan. Test it at low percentages early in the rollout, when the cost of rolling back is small.

API deprecation timelines

If you're on the receiving end of a vendor's deprecation announcement, the standard practice is 12 months between announcement and removal. Many vendors give less; some give more. Plan accordingly.

If you're announcing a deprecation, the same 12-month window is reasonable for stable customers. Two practical conventions that help:

The Sunset response header (RFC 8594) lets clients programmatically detect impending removal.
A monthly email reminder to all integrators using the deprecated API is more effective than a single announcement at deprecation time.

Common mistakes

Big-bang cutover. Migrating everything at once works for trivial integrations and fails for everything else. Start with one endpoint at low percentage.
No shadowing. Shipping a new implementation without comparing it against the old one means production users find the bugs. Shadow first.
Migrating the wrapper without migrating the assumptions. If the new API has fundamentally different semantics (different consistency model, different error patterns), the wrapper alone can't paper over it. The application has to change too.
Ignoring rate limits during dual-write. Doubling your call volume against an API can hit rate limits you didn't have before.
Forgetting the old API exists after cutover. If the old API stays accessible to your code, eventually some forgotten code path calls it. Decommission completely.
No metrics on the new side. "It seems to be working" is not a basis for rolling forward.
Treating the migration as a code change, not a project. Migrations have phases, success criteria, owner, communication. They benefit from being run like projects.

Where to go next

For the broader integration patterns the migration sits inside, see the integration guide. For the API versioning conventions on the producer side, see REST API Design. For thinking about backwards compatibility from the API designer's perspective, see API design best practices.