We test Corax against real-world infrastructure failures across every vendor, platform, and scenario. Browse the results below.
After activating disaster recovery, the DNS records are updated to point to the DR site, but propagation is taking much longer than expected due to high TTL values that were never reduced pre-failover. Many clients continue hitting the dead primary site for hours after the DNS change.
The cross-region database replication between the primary (us-east-1) and disaster recovery (eu-west-1) regions has fallen behind by 4 hours. A sustained write increase combined with network throttling between regions is causing the replica to fall further behind, with the gap widening every hour.
A monitoring check reveals that the most recent successful backup of the production database is 72 hours old, far exceeding the 1-hour RPO SLA. The backup job has been failing silently due to a full backup repository, and the replication to the offsite location has also been paused.
During an actual disaster recovery activation, the recovery process is taking far longer than the 4-hour RTO SLA. After 6 hours, only 3 of 12 critical services are operational. The recovery runbook is outdated, automation scripts are failing, and key personnel are unreachable.
A scheduled disaster recovery failover test reveals that the DR site cannot bring up critical services. The DR database has been silently failing replication for 2 weeks, the application servers have outdated configurations, and the DR network routing tables are stale.
During a utility power outage, the automatic transfer switch (ATS) fails to engage the backup generator, leaving the entire facility on UPS battery power with only 15 minutes of runtime remaining.
The CRAC unit humidifier malfunctions, causing server room humidity to drop to 15% RH. Low humidity creates static electricity risk that can damage sensitive electronic components and cause intermittent hardware failures.
A rack PDU is running at 95% capacity after additional equipment was installed without proper power planning, and the overload alarm is triggering. Any additional load will trip the breaker.
A power distribution unit circuit breaker trips in the primary server rack, cutting power to 6 servers including the domain controller, file server, and monitoring system. UPS did not engage because the PDU breaker is downstream.
A construction crew accidentally cuts the dark fiber connecting two office buildings, severing all network connectivity between the primary data center and the disaster recovery site, including storage replication.
During a NOC shift handoff, a critical alert for a client's ransomware detection is missed. The outgoing shift marked it as acknowledged but did not brief the incoming shift. The ransomware spreads for 4 additional hours before discovery.
The PSA ticketing system enters an auto-escalation loop where a ticket is escalated, triggers a workflow that reassigns it, which triggers another escalation, creating an infinite loop. The ticket generates 500+ email notifications and consumes the email sending quota.
The IT documentation platform (IT Glue/Hudu) becomes unreachable during a major client outage. Technicians cannot access network diagrams, credential vaults, or runbook procedures needed to resolve the issue. The documentation system is hosted on the same infrastructure experiencing the outage.
The RMM platform generates a false alarm storm after a monitoring agent update pushes incorrect threshold values. 2,000+ alerts fire simultaneously across all managed clients, overwhelming the NOC and masking real issues.
After a DNS migration, SPF, DKIM, and DMARC records are not properly recreated. Outbound emails are rejected by major providers (Gmail, Microsoft) due to authentication failures, and the company's email reputation score drops rapidly.
The Exchange journaling mailbox reaches its storage quota, causing journal reports to be NDR'd back to the sender. Email journaling stops functioning, creating a compliance gap for regulatory requirements (HIPAA, SEC Rule 17a-4).
A message size limit reduction on the Exchange transport rule causes a bounce storm. Automated systems sending reports with large attachments generate NDRs, which trigger auto-reply rules, creating a feedback loop of bounces and replies that overwhelms the mail system.
An internal SMTP relay is misconfigured as an open relay after a firewall change exposes it to the internet. Spammers discover and abuse it within hours, sending thousands of spam emails through the relay, causing the company's IP to be blacklisted.
Exchange Online Protection (EOP) begins quarantining legitimate business emails from a major client after a policy update. The mail flow disruption goes unnoticed for 6 hours until the client calls to complain about unanswered communications.
After enabling LDAP channel binding and signing enforcement on domain controllers (per Microsoft security advisory), multiple legacy applications that use simple LDAP binds break. Printers, scanners, and legacy ERP systems cannot authenticate against Active Directory.
Every scenario is tested against Corax's Neural Engine in a production environment with AI-powered root cause analysis.
Tests run continuously as new infrastructure patterns are added.