We test Corax against real-world infrastructure failures across every vendor, platform, and scenario. Browse the results below.
During a disaster recovery test, the database backup restore fails at 78% completion due to a corrupted backup chain. The backup verification job discovers that 3 incremental backups have invalid checksums, making the entire backup chain since the last full backup unrestorable. Production database has no valid point-in-time recovery option for the last 5 days.
A network partition splits a 3-node MariaDB Galera cluster into a 1-node partition and a 2-node partition. The isolated node enters non-primary state and rejects all queries. When the partition heals, the isolated node has divergent data that requires manual SST (State Snapshot Transfer) to resync, causing extended downtime.
PostgreSQL autovacuum has been unable to keep up with a high-write workload, and the database is approaching transaction ID wraparound. The autovacuum_freeze_max_age threshold is reached, forcing aggressive anti-wraparound vacuums that consume all I/O. Database performance degrades severely as aggressive vacuum competes with production queries.
An Oracle 19c production database runs out of space in the USERS tablespace after an overnight ETL job loads 3x the expected data volume. All INSERT and UPDATE operations fail with ORA-01653. The application returns errors on any write operation while reads continue to function.
SQL Server TempDB runs out of space due to a runaway query creating massive temp tables and sort operations. All concurrent queries requiring TempDB (sorts, hash joins, temp tables, version store) are blocked. The entire instance becomes effectively frozen.
A Redis Cluster node holding 5,461 hash slots crashes due to a memory corruption bug. The cluster marks the node as FAIL and attempts automatic failover to its replica. The replica promotion fails because the replica was behind on replication. Queries to affected hash slots return CLUSTERDOWN errors.
A network partition between MongoDB replica set members triggers repeated elections. The primary steps down, but no node can achieve majority quorum due to split-brain networking. Applications receive 'not master' errors and writes fail across all connected services.
MySQL master-replica replication breaks after a storage volume snapshot causes the binary log position to become invalid on the replica. The replica enters an error state with GTID gap, and the replication lag grows unbounded. Applications relying on read replicas begin returning stale data while writes to the master succeed.
Every scenario is tested against Corax's Neural Engine in a production environment with AI-powered root cause analysis.
Tests run continuously as new infrastructure patterns are added.