We test Corax against real-world infrastructure failures across every vendor, platform, and scenario. Browse the results below.
An attacker exploits an unkeyed header vulnerability to poison the CDN cache, causing all users requesting a specific page to receive a response containing injected malicious JavaScript. The poisoned cache entry has a 24-hour TTL and is replicated across all edge locations.
A GCP Pub/Sub subscription's dead letter topic accumulates millions of unprocessable messages after a schema change breaks the consumer application. The dead letter queue has no consumer, messages are piling up, and the original subscription's backlog is growing exponentially.
A GCP HTTP(S) load balancer marks all backend instances as unhealthy after a firewall rule change blocks the health check probe source IP range (35.191.0.0/16). The load balancer returns HTTP errors to all clients despite all backend VMs being fully operational.
An Azure Front Door configuration update introduces a routing rule error that sends 40% of production traffic to a staging backend pool. Users intermittently see staging data mixed with production data, causing data integrity concerns and customer confusion.
All Azure DevOps self-hosted build agents are stuck on hung builds, preventing any new CI/CD pipelines from running. The agent pool shows 0 available agents. Development velocity drops to zero as no code can be built, tested, or deployed.
A misconfigured Route 53 health check threshold causes all three regional endpoints to be marked unhealthy simultaneously during a brief network blip. Route 53 removes all records from DNS, causing a complete global outage even though all regions are actually healthy.
ECS service cannot place new tasks because all container instances in the cluster have exhausted their CPU and memory reservations. Auto-scaling group is at max capacity. Deployments are stuck with desired count never matching running count.
An Ansible playbook run against 200 production servers fails midway through execution due to a changed SSH host key on the jump host. 87 servers received the updated configuration while 113 did not, creating a split-brain configuration state across the fleet.
A CloudFormation stack update to production fails during resource creation, triggering an automatic rollback that itself gets stuck in UPDATE_ROLLBACK_FAILED state due to a manually modified resource outside of CloudFormation control.
Two CI/CD pipelines triggered simultaneously attempt to run terraform apply against the same state file. The DynamoDB state lock prevents both from proceeding, but a stale lock from a crashed previous run is never released, blocking all infrastructure deployments for 4 hours.
A rogue DHCP server on the network begins handing out IP addresses that conflict with statically assigned servers and network equipment, causing widespread connectivity issues as ARP tables become poisoned.
The centralized syslog server is overwhelmed by a log storm from a network event, dropping 80% of incoming messages. Critical security and compliance logs are being lost during an active incident.
The NetFlow collector server runs out of disk space, causing it to stop ingesting flow data from all network devices. Network visibility is lost, and security analytics based on flow data become non-functional.
Both TACACS+ AAA servers become unreachable due to a VLAN misconfiguration, locking all network administrators out of switches, routers, and firewalls. Only console port access remains available.
A 48-port PoE+ switch reaches its PoE power budget after 12 new WiFi 6E APs are connected, causing the switch to cut power to lower-priority devices including phones and security cameras.
An LACP port channel between the core switch and server farm switch loses all member links after a switch firmware bug causes LACP PDU processing to fail, severing connectivity for 50 servers.
IGMP snooping is disabled on a distribution switch after a firmware upgrade, causing all multicast traffic (video surveillance, IPTV, software distribution) to flood to every port on the VLAN, saturating access links.
During a utility power outage, the automatic transfer switch (ATS) fails to engage the backup generator, leaving the entire facility on UPS battery power with only 15 minutes of runtime remaining.
The CRAC unit humidifier malfunctions, causing server room humidity to drop to 15% RH. Low humidity creates static electricity risk that can damage sensitive electronic components and cause intermittent hardware failures.
A rack PDU is running at 95% capacity after additional equipment was installed without proper power planning, and the overload alarm is triggering. Any additional load will trip the breaker.
Every scenario is tested against Corax's Neural Engine in a production environment with AI-powered root cause analysis.
Tests run continuously as new infrastructure patterns are added.