We test Corax against real-world infrastructure failures across every vendor, platform, and scenario. Browse the results below.
The primary Windows DHCP server crashes and the DHCP failover partner does not transition to 'Partner Down' state due to a misconfigured maximum client lead time (MCLT). Clients with expiring leases cannot renew and start losing connectivity.
The DHCP scope for the main user VLAN is 99% exhausted. New devices connecting to the network fail to obtain IP addresses. Users reporting 169.254.x.x APIPA addresses. Scope was sized for 200 but 210 devices now on the VLAN due to BYOD growth.
An administrator changes the Hyper-V virtual switch binding from the production NIC team to a single disconnected NIC. All 25 VMs on that virtual switch lose network connectivity instantly. The admin is locked out of remote management.
A Hyper-V host in a 3-node failover cluster experiences a blue screen of death. Failover clustering attempts to live-migrate VMs to surviving nodes but 4 highly available VMs fail to migrate due to anti-affinity rules and insufficient resources.
A DRS-triggered vMotion fails mid-migration due to a vMotion network MTU mismatch. The VM enters a stuck state — partially migrated, with the source host holding the memory pages and the destination unable to complete the switchover.
A VMFS 6 datastore becomes inaccessible due to a storage path failure (all-paths-down APD condition). All 22 VMs on the datastore freeze with I/O errors. vSphere triggers APD timeout handling after 140 seconds.
The vCenter Server Appliance (VCSA) becomes unresponsive due to a database corruption in the embedded PostgreSQL. All management operations are impossible. VMs continue running but no changes, migrations, or monitoring can occur.
An ESXi 8.0 host experiences a Purple Screen of Death (PSOD) due to a faulty network driver. All 35 VMs on the host crash simultaneously. HA attempts to restart them on surviving hosts but resource contention causes slow recovery.
Meraki Enterprise licenses expire on a Saturday night. Dashboard access becomes read-only. Advanced features including Auto VPN, traffic shaping, and client analytics are disabled. APs continue broadcasting but without content filtering or group policies.
The Meraki VPN concentrator hub at the data center fails, breaking all Auto VPN tunnels in the mesh. 8 branch sites lose connectivity to central resources including file shares, ERP, and VoIP.
An upstream switch reboot causes 20 Meraki MR46 access points to lose their uplink simultaneously. APs lose Meraki Dashboard cloud connectivity and fall into local management mode. SSIDs remain broadcasting but no new clients can authenticate via RADIUS.
The primary Meraki MX450 appliance at a large campus fails due to a firmware crash. The warm spare MX450 assumes the primary role after a 45-second failover gap. All site-to-site VPN tunnels and client connections are disrupted during the transition.
A production server's 10G SFP is failing, causing CRC errors and packet drops on the network interface. Applications experience intermittent connectivity and retransmissions.
A vulnerability scanner on the network is using incorrect SNMP community strings, generating thousands of SNMP authentication failure traps from every managed device. NMS is overwhelmed.
A ransomware attack is actively encrypting files on the primary file server. Hundreds of files are being renamed with .encrypted extension. Multiple users report locked files. The attack originated from a phished employee workstation.
Both internal DNS servers fail simultaneously (primary: disk full, secondary: expired DNSSEC key). All internal name resolution fails. Every service that depends on DNS stops working.
A utility power outage exceeds UPS battery runtime. The UPS runs out of battery and shuts down. Half the rack loses power. Servers, switches, and storage go offline simultaneously. Generator fails to start due to dead battery.
A Kubernetes deployment enters CrashLoopBackOff because a ConfigMap was deleted during a cleanup script. The application cannot start without its configuration, and the backoff timer keeps increasing.
A dev team runs a massive data import job on a shared SAN, consuming all available IOPS. Production VMs on the same storage pool experience 10x latency increase, causing application timeouts.
A RAID 5 array on a file server has one failed drive and a second drive showing SMART predictive failure warnings. The array is rebuilding, but if the second drive fails before rebuild completes, all data is lost.
Every scenario is tested against Corax's Neural Engine in a production environment with AI-powered root cause analysis.
Tests run continuously as new infrastructure patterns are added.