Infrastructure Scenario Tests

We test Corax against real-world infrastructure failures across every vendor, platform, and scenario. Browse the results below.

276
Total Tests
100.0%
Pass Rate
276
Passed
0
Failed

Kubernetes PersistentVolumeClaim Stuck Pending

PASS

Multiple PersistentVolumeClaims in a Kubernetes cluster are stuck in Pending state after the cloud provider's storage provisioner hits its volume limit. New StatefulSet pods cannot start because they require persistent storage. The storage class provisioner logs show quota exceeded errors.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Remote Hands21 correlated

Kubernetes Ingress Controller Crash — External Traffic Blocked

PASS

The NGINX Ingress Controller in a production Kubernetes cluster crashes after a malformed Ingress resource is applied. The controller enters a CrashLoopBackOff state. All external HTTP/HTTPS traffic to the cluster is blocked because no ingress controller pods are running to route traffic to backend services.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Auto-Heal28 correlated

GCP Cloud Spanner Regional Outage

PASS

A Google Cloud Spanner regional instance experiences a zone-level failure in us-central1-a. The multi-zone configuration should provide automatic failover, but a configuration error in the instance's node count causes the remaining zones to be overloaded. Read and write latencies spike above SLA thresholds.

CloudPattern: STORAGE_IO_LATENCYSeverity: CRITICALConfidence: 95%Remote Hands42 correlated

Azure Cosmos DB Throttling — RU/s Budget Exhausted

PASS

Azure Cosmos DB begins aggressively throttling requests after a marketing campaign drives 10x normal traffic. The provisioned RU/s budget is exhausted and autoscale max is reached. Applications receive HTTP 429 (Too Many Requests) responses. Retry storms amplify the problem as clients retry throttled requests.

CloudPattern: AZURE_CLOUDSeverity: CRITICALConfidence: 95%Remote Hands42 correlated

Azure Key Vault Unavailable — Secrets Rotation Blocked

PASS

Azure Key Vault becomes unreachable due to a misconfigured private endpoint and NSG rule change during a network security audit. All applications that fetch secrets, encryption keys, or certificates from Key Vault at startup or rotation time fail. Services that cache secrets continue working but cannot rotate credentials.

CloudPattern: AZURE_CLOUDSeverity: CRITICALConfidence: 95%Auto-Heal42 correlated

AWS EC2 Instance Store Loss — Ephemeral Data Gone

PASS

An EC2 instance with instance store (ephemeral) volumes experiences a hardware failure on the underlying host. AWS stops and restarts the instance on new hardware, but all instance store data is lost. The application had been incorrectly storing session data and temporary processing files on instance store volumes instead of EBS.

CloudPattern: AWS_CLOUDSeverity: CRITICALConfidence: 95%Remote Hands20 correlated

AWS S3 Bucket Policy Lockout — Data Inaccessible

PASS

A misconfigured S3 bucket policy denies all access including the root account. The bucket contains 15TB of production assets (user uploads, documents, media). All applications that read from or write to the bucket receive AccessDenied errors. Even the AWS console shows access denied.

CloudPattern: AWS_CLOUDSeverity: CRITICALConfidence: 95%Remote Hands28 correlated

Database Backup Restore Failure During DR Test

PASS

During a disaster recovery test, the database backup restore fails at 78% completion due to a corrupted backup chain. The backup verification job discovers that 3 incremental backups have invalid checksums, making the entire backup chain since the last full backup unrestorable. Production database has no valid point-in-time recovery option for the last 5 days.

DatabasePattern: DATABASE_EVENTSeverity: CRITICALConfidence: 95%Remote Hands22 correlated

MariaDB Galera Cluster Split-Brain

PASS

A network partition splits a 3-node MariaDB Galera cluster into a 1-node partition and a 2-node partition. The isolated node enters non-primary state and rejects all queries. When the partition heals, the isolated node has divergent data that requires manual SST (State Snapshot Transfer) to resync, causing extended downtime.

DatabasePattern: DATABASE_EVENTSeverity: CRITICALConfidence: 85%Remote Hands27 correlated

PostgreSQL Vacuum Bloat — Transaction ID Wraparound Warning

PASS

PostgreSQL autovacuum has been unable to keep up with a high-write workload, and the database is approaching transaction ID wraparound. The autovacuum_freeze_max_age threshold is reached, forcing aggressive anti-wraparound vacuums that consume all I/O. Database performance degrades severely as aggressive vacuum competes with production queries.

DatabasePattern: DATABASE_EVENTSeverity: CRITICALConfidence: 95%Remote Hands20 correlated

Oracle Tablespace Exhaustion — ORA-01653

PASS

An Oracle 19c production database runs out of space in the USERS tablespace after an overnight ETL job loads 3x the expected data volume. All INSERT and UPDATE operations fail with ORA-01653. The application returns errors on any write operation while reads continue to function.

DatabasePattern: DATABASE_EVENTSeverity: CRITICALConfidence: 95%Remote Hands21 correlated

SQL Server TempDB Full — All Queries Blocked

PASS

SQL Server TempDB runs out of space due to a runaway query creating massive temp tables and sort operations. All concurrent queries requiring TempDB (sorts, hash joins, temp tables, version store) are blocked. The entire instance becomes effectively frozen.

DatabasePattern: DATABASE_EVENTSeverity: CRITICALConfidence: 95%Auto-Heal42 correlated

Redis Cluster Node Failure — Slot Coverage Lost

PASS

A Redis Cluster node holding 5,461 hash slots crashes due to a memory corruption bug. The cluster marks the node as FAIL and attempts automatic failover to its replica. The replica promotion fails because the replica was behind on replication. Queries to affected hash slots return CLUSTERDOWN errors.

DatabasePattern: DATABASE_EVENTSeverity: CRITICALConfidence: 85%Remote Hands42 correlated

MongoDB Replica Set Election Storm

PASS

A network partition between MongoDB replica set members triggers repeated elections. The primary steps down, but no node can achieve majority quorum due to split-brain networking. Applications receive 'not master' errors and writes fail across all connected services.

DatabasePattern: DATABASE_EVENTSeverity: CRITICALConfidence: 85%Remote Hands42 correlated

MySQL Replication Break — Binary Log Position Lost

PASS

MySQL master-replica replication breaks after a storage volume snapshot causes the binary log position to become invalid on the replica. The replica enters an error state with GTID gap, and the replication lag grows unbounded. Applications relying on read replicas begin returning stale data while writes to the master succeed.

DatabasePattern: DATABASE_EVENTSeverity: CRITICALConfidence: 85%Remote Hands27 correlated

Zoom Phone Failover Failure — PSTN Survivability Lost

PASS

Zoom Phone cloud service experiences an outage during a severe weather event that also knocks out the primary internet circuit. The Zoom Phone Survivability Gateway (local appliance) fails to activate because it was never properly configured after installation. The office has no phone service — cloud calling is down and local failover doesn't work.

VoIPPattern: TIMEOUTSeverity: CRITICALConfidence: 95%Remote Hands21 correlated

Ring Group Misconfiguration — Inbound Calls Not Routing

PASS

After a PBX configuration change, all ring groups are pointing to non-existent extensions. Inbound PSTN calls ring once and go to a generic voicemail box instead of reaching the intended departments. Sales, support, and main line calls are all misrouted. The issue was caused by an extension renumbering project that didn't update ring group memberships.

VoIPPattern: VOIP_QUALITYSeverity: CRITICALConfidence: 85%Remote Hands20 correlated

Call Recording Compliance Failure — Regulatory Risk

PASS

The call recording system stops capturing calls after the recording storage volume fills up. The VoIP system continues functioning but no calls are being recorded, putting the organization in violation of regulatory compliance requirements (HIPAA/PCI). The failure went undetected for 48 hours.

VoIPPattern: VOIP_QUALITYSeverity: CRITICALConfidence: 85%Remote Hands23 correlated

Session Border Controller Failure — All SIP Trunks Down

PASS

The primary SBC (Session Border Controller) suffers a hardware failure, dropping all active calls and preventing new call setup. The standby SBC fails to take over because the HA license expired. All inbound and outbound PSTN calls are completely offline.

VoIPPattern: VOIP_QUALITYSeverity: CRITICALConfidence: 95%Remote Hands34 correlated

Microsoft Teams Calling Outage — Enterprise Voice Down

PASS

Microsoft Teams Phone System experiences a regional outage affecting all Teams calling features. Users cannot make or receive PSTN calls through Teams. Direct Routing SBC shows the Teams backend as unreachable. Internal Teams chat and meetings work but all telephony features are offline.

VoIPPattern: AZURE_CLOUDSeverity: CRITICALConfidence: 85%Remote Hands35 correlated
PreviousPage 9 of 14Next

Every scenario is tested against Corax's Neural Engine in a production environment with AI-powered root cause analysis.

Tests run continuously as new infrastructure patterns are added.