1. Communication when everything is down

The Amazon Web Services outage in October 2025 took down Slack, Zoom and Snapchat. In a widespread disruption, your first problem is not customer communication. It is internal coordination. When access to cloud providers, SaaS tools, and online services is partially or fully disrupted, most SaaS companies discover that their teams cannot even talk to each other - let alone fix systems or communicate externally.

Failure mode

In large-scale outages, the following often happen simultaneously:

  • Slack, Teams, or internal chat tools are unreachable,
  • Corporate email is unavailable,
  • Identity providers become unreachable,
  • Monitoring and alerting systems stop sending notifications,
  • Documentation and runbooks are inaccessible.

The result is not just confusion. It is organizational paralysis. Employees don’t know what the current situation is, who is taking action, or how to report progress. This is a failure of communication design, not engineering.

Objectives

Your internal communication setup must enable four things:

  • Reachability: Key people must be able to reach each other quickly.
  • Authority clarity: Everyone must know who decides what.
  • Situation awareness: A shared understanding of what is broken, what is unknown, what is being worked on.
  • Action coordination: Tasks must be assigned, tracked, and acknowledged—even informally.

Solution: the Crisis Communication Stack

Layer 1: Emergency Reachability (Non-Negotiable)

Purpose: Reach decision-makers and operators when everything else fails.

The most fundamental requirement in a crisis is reachability. Key decision-makers and operators must be able to contact one another quickly, without depending on a single online service. This often means falling back to technologies that feel mundane but are resilient: phone calls, SMS, or pre-established phone trees. 

For this to work, contact information must exist in a form that is accessible without the internet or corporate systems. Large disruptions might provide cover for cyber attacks, so phone numbers can be a first way to enable key stakeholders prove their identity.

Key people must be able to reach each other within minutes, even under degraded conditions. If that is not possible, the organization is effectively offline. A phone tree should be setup to ensure everyone in the organization can be reached.

Deliverables:
- Printed or offline contact list including phone numbers, pre-loaded in phones for authentication of the key stakeholders

Layer 2: Crisis Coordination Channel

Purpose: Maintain a shared, minimal communication channel for coordination.

Once basic reachability is established, the next priority is creating a shared space for coordination. In normal operations, teams may be spread across many channels and tools. In a crisis, this fragmentation becomes a liability. Selecting and setting up such a channel should not be improvised in the moment. 

The goal is not to recreate full collaboration capabilities, but to maintain a single, minimal channel through which information can flow. This channel should be independent of the primary SaaS stack and as lightweight as possible, so that it continues to function under poor network conditions. It should be known in advance, tested periodically, and reserved explicitly for crisis coordination.

Secure peer-to-peer messaging apps (ex: using Matrix or XMPP) can be a good way to start: 

  • 🇨🇭Threema: Partially open source, minimal metadata stored, strong Swiss protection
  • 🇬🇧Element: Open-source, E2E encryption, federated.

You can further harden your preparedness by setting up a resilient mesh messaging app in case internet itself becomes unavailable. Those apps work without internet or cellular networks, using Bluetooth, Wi-Fi Direct, or mesh networking. There are unfortunately not many EU-based solutions, but some are open-source :

Make sure to have those apps installed, and to keep them updated (even if you don’t setup them right away). 

If your top management team cannot reach each other within minutes, you are not operational.

Deliverables:
- Pre-installed, pre-configured and tested privacy-oriented messaging apps for key stakeholders
- Pre-installed mesh messaging apps, maintained up-to-date
- Groups created and periodically tested

Layer 3: Situation Updates

Purpose: Maintain a single source of truth for the internal situation.

In the early stages of a disruption, information is incomplete and often contradictory. Waiting for perfect clarity before communicating is a common mistake, and a costly one. In the absence of updates, people assume the worst or duplicate work. “No update” is worse than “no progress”.

A more effective approach is to provide regular, structured updates, even if those updates contain little new information. A simple statement of what is known, what remains unclear, and what actions are currently underway by whom is often enough to maintain alignment and reduce anxiety.

These updates should come from a single, clearly identified source. This does not mean that information flows only in one direction, but it does mean that there is one voice responsible for synthesizing and broadcasting the current state. Without this, teams quickly lose track of what matters most.

💡
Situation Update template
Characteristics: Short, structured updates + Regular cadence + Posted by a single owner
Typical update format:
- What we know
- What we don’t know
- What we’re doing next
- When the next update will come
Deliverables:
- Team trained on crisis communication (short, structured updates, posted by a single owner, agreed update frequency)

Anticipating knowledge fragmentation

In a SaaS company, critical knowledge is distributed across people. Some understand how the infrastructure is deployed and recovered. Other know how DNS is configured and which registrar controls the domain. Someone manages customer data, billing access, or the CRM. Credentials may live in a password manager, a cloud IAM system, or a corporate identity provider that is itself unavailable during an outage. Under normal conditions, this fragmentation is manageable. Communication tools and shared systems allow teams to coordinate implicitly. When those tools fail, the organization discovers that it does not lack knowledge : it lacks access to knowledge.

This is why internal communication and knowledge preparedness are inseparable. In a crisis, the organization must be able to quickly answer basic questions: who knows how to do this, who is allowed to do it, and how can they be reached? If those answers are unclear, decision-making slows and coordination breaks down.

Prepared SaaS companies assume that no single person has a complete view of the system and that some people may be temporarily unreachable. They compensate by making critical knowledge explicit, role-based, and accessible under degraded conditions. The goal is not to eliminate specialization, but to ensure that specialization does not block recovery.

Deliverables:
- Critical Knowledge Map: A short document identifying key operational domains (infrastructure, DNS, identity, customer data, billing, communication), the primary and secondary owners for each, and how they can be reached during a crisis.
- Offline Knowledge Access policy: A defined method for accessing essential information when SaaS tools are down: printed summaries, encrypted local copies, or offline documentation bundles.

What not to do during a crisis

One of the most damaging instincts during a crisis is the urge to improve the communication setup while the crisis is unfolding. Creating new channels, introducing new tools, or changing processes mid-incident increases cognitive load at exactly the wrong moment.

In degraded conditions, simplicity is a feature. Fewer channels, fewer speakers, and shorter messages make coordination easier, not harder. The goal is not to optimize communication, but to keep it functional.

Transitioning to external communication

Only once internal coordination is restored should customer communication begin. Customer communication requires clarity, consistency, and confidence, none of which are possible if the team itself is still confused.

When usual communication channels are unavailable, companies also lose an important signal: authenticity. During large-scale incidents, external audiences need a trusted way to distinguish official updates from rumors or impersonation. You should therefore maintain a minimal presence on independent or distributed social networks, such as Bluesky or Mastodon. These accounts should be created in advance, referenced in official documentation, and used periodically so they are recognized as legitimate. 

This does not mean waiting for full recovery. It means waiting until the organization has a coherent understanding of the situation and a plan for the next steps. Internal communication remains the priority until that threshold is crossed.

Deliverables:
- Emergency contacts for key suppliers, customers and stakeholders
- Off-cloud status page
- Accounts created on distributed social networks

Conclusion

In a widespread disruption, internal communication is the first system you must recover. Without it, technical expertise, preparation, and resources remain unused. With it, even severely degraded systems can be brought back under control. 

Digital preparedness is not only about infrastructure and data. It is about preserving the human ability to coordinate under stress.