6. Infrastructure exit strategy

6. Infrastructure exit strategy
Photo by Andrew Teoh / Unsplash

Twenty-four hours after the Shock, the mood briefly shifts. Against all odds, the IT team had made real progress. A regional provider was identified. Accounts were opened. Networks recreated. Containers pulled. One by one, core services start responding again. It is slower than before, rougher, missing pieces - but alive. For the first time since the Shock, there is visible momentum.

This was until some of the most critical workloads refused to start. The binaries had been compiled years ago against a specific CPU architecture. In the primary cloud, that detail was invisible. The new provider only offers a different processor family. A few services can be recompiled. Others cannot. Certain legacy components depend on libraries that no one has built in years. The engineers realized the uncomfortable truth: what looked portable was only portable inside its original environment.

Objective

Too many organizations frame “exit” as something you do after a crisis: when a provider fails, when sanctions hit, when legal access is revoked, when costs explode. That framing is backwards: you don’t design emergency exits when the fire starts - you build them during construction and run fire drills. An exit strategy is not a response mechanism, it is a design principle. 

The objective is simple and unapologetic : you must be able to redeploy elsewhere - even if it’s slow. It does not need to be “seamless” or “automated” - it just needs to be possible, under a time frame of your own choosing. 

Solutions

Cold restart beats live multi-cloud

Let’s clear one myth immediately: live multi-cloud is not the goal. Running active workloads across multiple hyperscalers sounds impressive on slides. Consider a typical SaaS platform attempting live multi-cloud:

  • Every service must be compatible with multiple provider-specific networking models.
  • Identity, permissions, and secrets must be synchronized across clouds.
  • Managed services must be avoided or replaced with custom equivalents.
  • Debugging becomes exponentially harder, because failures are no longer local.

The result is a system that is expensive to run, difficult to evolve, and understood by fewer and fewer people over time. 

Maintaining a fully live secondary environment is expensive and usually unnecessary. A cold standby is enough - and far more realistic. A cold standby environment typically includes:

  • An active account with an alternative provider.
  • Validated access paths (VPN, credentials).
  • Infrastructure code that has been applied at least once.
  • Deployment pipelines that have successfully run.
  • Data restoration procedures that have been tested.

A cold standby does not necessarily include live workloads or mirrored traffic. If the need arises, you will be already in a very good position with an environment that has the right data but responds slowly. You will have some breathing room to learn how to run this infrastructure under load. 

As a practice run, you can spin up your cold environment once per quarter, deploy the MSS, validate that it starts, and tear it down again. This is not redundancy, it is muscle memory. 

An exit strategy doesn’t require that you run everywhere. It requires that you can run elsewhere. The right mental model is not “failover”, it is cold restart: the ability to stand up your MSS from scratch, on a different infrastructure, within a bounded and acceptable timeframe. The first time you deploy elsewhere must not be during an actual crisis.

Portable by design, not abstracted to death

The core principle of an exit strategy is portability by design. This does not mean inventing your own platform or chasing perfect abstraction. It means making deliberate choices that keep redeployment feasible. 

Start with what can be containerized. If a workload can run in a container, it should. Containers are not about fashion or orchestration frameworks; they are about reducing assumptions. A container packages runtime, dependencies, and configuration in a form that can move across environments with minimal friction. They can run on an hyperscaler, on a regional cloud, on bare metal. 

This does not mean standardizing on Kubernetes everywhere. Orchestration is a means, not an objective. What matters is that workloads are not inseparably tied to proprietary runtimes. If a service cannot run outside its original provider, it is not deployable - it is hosted.

Next, embrace Infrastructure-as-Code. Infrastructure-as-Code is often justified as a way to move faster or reduce configuration drift. But it has a side benefit: portability by making infrastructure reproducible. 

An exit-oriented use of Infrastructure-as-Code means that networks are defined as code, not diagrams. It means that firewalls and routing rules are explicit. That storage layouts are intentional and documented.

Ask yourself: if you lost access to your primary cloud tomorrow, could you recreate your network, compute, storage, and access controls elsewhere without guesswork? If your answer relies on screenshots, wiki pages or “we’ll figure it out”, then you don’t have infrastructure: you have a liability. 

Rethinking “elsewhere”: bare metal and regional providers

In an exit scenario, “elsewhere” may mean a European regional cloud with fewer managed services, a sovereign hosting provider with strict jurisdictional guarantees, a managed bare-metal platform or even manually provisioned servers. They offer fewer managed services, slower provisioning, and less abstraction. In purely operational terms, this is correct: they are a step backward compared to hyperscalers.

And that is precisely why they matter: bare metal and regional providers strip away convenience. They expose what your system actually depends on, rather than what a platform quietly takes care of for you. 

You cannot discover your alternative provider on the day you need it. Each regional cloud or bare-metal host has its own realities: networking models, provisioning delays, storage semantics, security primitives, operational tooling. An exit strategy only exists if those constraints are known in advance. Being specific (naming one or two concrete target providers) forces architecture to confront reality.

Managed services vs. cost: an unexpected trade-off

When teams attempt to deploy their MSS outside a hyperscaler, they often discover that many managed services they assumed were “core” are merely convenient. 

Managed services are not always cheaper than bare metal. They are often cheaper in time, not in money. They allow teams to iterate quickly, experiment freely, and scale without friction. That is exactly why they are attractive during growth phases, when iteration speed matters more than survivability. But an exit scenario is different. The MSS is, by definition, narrower and more stable. The focus shifts from rapid iteration to controlled execution.

When redeploying on simpler infrastructure, organizations often take the time to refactor the heaviest or most expensive components: databases that no longer need autoscaling at peak capacity, background processing systems that can be throttled or batched, etc… The result is sometimes surprising: a leaner infrastructure, running on fewer resources, at a lower cost than the original managed setup. What began as a contingency plan becomes an optimization exercise.

The exit playbook: from theory to execution

All of this comes together in a concrete deliverable: the exit playbook. Writing this playbook is uncomfortable, and that’s the point. It forces alignment between architecture, operations, and leadership around what is actually survivable.

Deliverables: a tested playbook that contains, as a minimum:
- Target alternative providers: not a long list, only one or two realistic options that support the minimal services you need to operate. Named, vetted, with accounts created. 
- Deployment steps: from account creation to MSS availability. 
- Time estimates: Not optimistic guesses. Measured or rehearsed durations. This is where the seven-day target becomes real—or collapses.
- Known blockers: Proprietary services that cannot be replicated. Data volumes that take too long to move. Skills gaps. Legal constraints.

Conclusion

An infrastructure exit strategy is not a vote of no confidence in your current provider. It is a declaration of independence. You don’t need to run elsewhere today, you need the ability to run elsewhere. That ability changes how calmly your organization behaves when assumptions break.

Read more