Infrastructure Scenario Tests

We test Corax against real-world infrastructure failures across every vendor, platform, and scenario. Browse the results below.

276
Total Tests
100.0%
Pass Rate
276
Passed
0
Failed

DR Runbook Automation Failure — Scripts Cannot Execute Recovery

PASS

During a disaster recovery drill, the automated runbook scripts fail to execute because the orchestration server's credentials to the DR environment have expired, the API endpoints have changed, and 3 of 8 recovery scripts reference infrastructure that has been decommissioned.

InfrastructurePattern: UNKNOWNSeverity: CRITICALConfidence: 95%Remote Hands22 correlated

Warm Standby Database Out of Sync — Failover Would Cause Data Loss

PASS

The warm standby database at the DR site has fallen out of synchronous replication mode and has been running in async mode for 5 days without anyone noticing. The standby is now 2 hours behind the primary. If failover were triggered now, 2 hours of committed transactions would be lost.

InfrastructurePattern: DATABASE_EVENTSeverity: CRITICALConfidence: 95%Remote Hands17 correlated

Failover DNS Propagation Delay — Extended Outage

PASS

After activating disaster recovery, the DNS records are updated to point to the DR site, but propagation is taking much longer than expected due to high TTL values that were never reduced pre-failover. Many clients continue hitting the dead primary site for hours after the DNS change.

InfrastructurePattern: DNS_FAILURESeverity: CRITICALConfidence: 95%Remote Hands18 correlated

Cross-Region Replication Lag Critical — Data Consistency at Risk

PASS

The cross-region database replication between the primary (us-east-1) and disaster recovery (eu-west-1) regions has fallen behind by 4 hours. A sustained write increase combined with network throttling between regions is causing the replica to fall further behind, with the gap widening every hour.

InfrastructurePattern: DATABASE_EVENTSeverity: CRITICALConfidence: 85%Remote Hands17 correlated

RPO Violation — Backup Gap Exceeds SLA

PASS

A monitoring check reveals that the most recent successful backup of the production database is 72 hours old, far exceeding the 1-hour RPO SLA. The backup job has been failing silently due to a full backup repository, and the replication to the offsite location has also been paused.

InfrastructurePattern: BACKUP_FAILURESeverity: CRITICALConfidence: 95%Remote Hands19 correlated

RTO Violation — Recovery Taking Too Long

PASS

During an actual disaster recovery activation, the recovery process is taking far longer than the 4-hour RTO SLA. After 6 hours, only 3 of 12 critical services are operational. The recovery runbook is outdated, automation scripts are failing, and key personnel are unreachable.

InfrastructurePattern: UNKNOWNSeverity: CRITICALConfidence: 85%Remote Hands20 correlated

DR Site Failover Test Failure — Recovery Environment Not Functional

PASS

A scheduled disaster recovery failover test reveals that the DR site cannot bring up critical services. The DR database has been silently failing replication for 2 weeks, the application servers have outdated configurations, and the DR network routing tables are stale.

InfrastructurePattern: DATABASE_EVENTSeverity: CRITICALConfidence: 95%Remote Hands29 correlated

Vulnerability Scan — Critical CVE Detected on Production

PASS

An automated vulnerability scan discovers a critical CVE with a CVSS score of 10.0 affecting the production web server software. The vulnerability allows unauthenticated remote code execution and has a known public exploit. The affected software is running on 15 production servers.

SecurityPattern: UNKNOWNSeverity: CRITICALConfidence: 95%Remote Hands18 correlated

Security Audit Failure — Weak Cipher Suites on Production

PASS

A quarterly security audit discovers that 23 production services are still offering deprecated cipher suites including RC4, DES, and 3DES. Several services also support TLS 1.0 and 1.1. This fails PCI DSS Requirement 4.1 and multiple CIS benchmarks.

SecurityPattern: UNKNOWNSeverity: CRITICALConfidence: 85%Remote Hands18 correlated

Failed Penetration Test — Critical RCE Finding in Production

PASS

An external penetration test discovers a critical remote code execution vulnerability in the production API through an unsanitized file upload endpoint. The pentester demonstrates full shell access to the application server and lateral movement to the database server.

SecurityPattern: UNKNOWNSeverity: CRITICALConfidence: 95%Remote Hands18 correlated

CIS Benchmark Drift — Hardening Configuration Reverted

PASS

A weekly CIS benchmark compliance scan detects that 43 production servers have drifted from their hardened baseline. An unauthorized configuration management change reverted SSH hardening, disabled audit logging, and re-enabled insecure protocols across the production fleet.

SecurityPattern: SSL_ERRORSeverity: CRITICALConfidence: 90%Remote Hands18 correlated

GDPR Data Retention Violation — PII Not Purged

PASS

A GDPR compliance scan discovers that the automated data retention purge job has been silently skipping records due to a foreign key constraint error. 2.3 million EU user records past their retention period have not been deleted, violating GDPR Article 5(1)(e) storage limitation principle.

SecurityPattern: UNKNOWNSeverity: CRITICALConfidence: 95%Remote Hands18 correlated

SOC 2 Access Control Violation — Terminated Employee Still Active

PASS

An automated SOC 2 compliance scan discovers 14 terminated employee accounts that are still active across production systems. The offboarding automation failed silently for 3 months, and these accounts retain full production access including database admin and cloud console roles.

SecurityPattern: UNKNOWNSeverity: CRITICALConfidence: 85%Remote Hands18 correlated

HIPAA Audit Log Gap — Logging Service Failure

PASS

The centralized audit logging service for a healthcare application has been silently failing for 72 hours. No access logs for electronic protected health information (ePHI) were captured during this period, creating a HIPAA audit trail gap that must be reported and remediated.

SecurityPattern: UNKNOWNSeverity: CRITICALConfidence: 95%Remote Hands18 correlated

PCI DSS Scope Creep — Unencrypted Cardholder Data Detected

PASS

A database scan discovers unencrypted cardholder data (primary account numbers) stored in a staging database that was never intended to be in PCI scope. A developer copied production data to staging for debugging without masking sensitive fields, violating PCI DSS Requirement 3.4.

SecurityPattern: UNKNOWNSeverity: CRITICALConfidence: 95%Remote Hands18 correlated

Container Registry Rate Limit Exceeded — Deployments Blocked

PASS

A Kubernetes cluster's nodes are all pulling images from Docker Hub simultaneously during a rolling deployment, exceeding the Docker Hub rate limit (100 pulls/6 hours for anonymous, 200 for authenticated). All image pulls fail with 429 Too Many Requests, blocking deployments and pod restarts across the cluster.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Auto-Heal18 correlated

K8s Network Policy Blocking Inter-Pod Communication

PASS

A newly applied Kubernetes NetworkPolicy with an overly restrictive ingress rule blocks all traffic between the API pods and the database pods. The policy was intended to restrict external access but inadvertently blocks intra-cluster communication, causing a complete application outage.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Auto-Heal22 correlated

Kubernetes HPA Autoscaling Failure — Metrics Server Down

PASS

The Kubernetes metrics-server deployment is crashlooping due to a misconfigured TLS flag after a Helm chart upgrade. Without metrics-server, the Horizontal Pod Autoscaler cannot read CPU/memory metrics and stops scaling pods. During a traffic surge, pods remain at minimum replica count and become overwhelmed.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Auto-Heal22 correlated

Multi-Cloud DNS Failover Failure — Both Providers Down

PASS

A multi-cloud architecture uses DNS-based failover between primary (cloud provider A) and secondary (cloud provider B). The DNS failover mechanism itself fails because the health check endpoint uses a shared authentication service that is down on both clouds, causing the DNS provider to mark both targets as unhealthy.

CloudPattern: UNKNOWNSeverity: CRITICALConfidence: 80%Remote Hands30 correlated

CDN Origin Shield Failure — Origin Server Overwhelmed

PASS

The CDN origin shield layer fails, causing all 50+ edge locations to simultaneously request content directly from the origin server. The origin is designed to handle coalesced requests from 1 shield, not direct requests from 50+ edges. Origin collapses under the load.

CloudPattern: UNKNOWNSeverity: CRITICALConfidence: 85%Remote Hands27 correlated
PreviousPage 2 of 14Next

Every scenario is tested against Corax's Neural Engine in a production environment with AI-powered root cause analysis.

Tests run continuously as new infrastructure patterns are added.