Infrastructure Scenario Tests

We test Corax against real-world infrastructure failures across every vendor, platform, and scenario. Browse the results below.

21,502
Total Tests
100.0%
Pass Rate
21,502
Passed
0
Failed

Ansible Playbook Failure — Configuration Drift Across Fleet

PASS

An Ansible playbook run against 200 production servers fails midway through execution due to a changed SSH host key on the jump host. 87 servers received the updated configuration while 113 did not, creating a split-brain configuration state across the fleet.

CloudPattern: FIREWALL_RULE_BLOCKSeverity: CRITICALConfidence: 85%Remote Hands21 correlated

CloudFormation Stack Rollback — Production Update Failed

PASS

A CloudFormation stack update to production fails during resource creation, triggering an automatic rollback that itself gets stuck in UPDATE_ROLLBACK_FAILED state due to a manually modified resource outside of CloudFormation control.

CloudPattern: AWS_CLOUDSeverity: CRITICALConfidence: 95%Remote Hands22 correlated

Terraform State Lock Conflict — Parallel Apply Blocking Deployments

PASS

Two CI/CD pipelines triggered simultaneously attempt to run terraform apply against the same state file. The DynamoDB state lock prevents both from proceeding, but a stale lock from a crashed previous run is never released, blocking all infrastructure deployments for 4 hours.

CloudPattern: UNKNOWNSeverity: CRITICALConfidence: 95%Auto-Heal21 correlated

Azure AD Connect Sync Loop

PASS

Azure AD Connect enters a synchronization loop where the delta sync cycle never completes, continuously restarting. Password hash synchronization stops working, and on-premises changes are not reflecting in Azure AD.

CloudPattern: AZURE_CLOUDSeverity: CRITICALConfidence: 85%Remote Hands5 correlated

Helm Chart Rollback Failure — Stuck Between Versions

PASS

A Helm chart upgrade fails partway through due to a resource conflict, and the automatic rollback also fails because the previous release's CRDs are incompatible with the current cluster state. The release is stuck in a 'pending-rollback' state. Kubernetes resources are in a mixed state between the old and new versions.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Remote Hands29 correlated

Kubernetes Secret Rotation Failure — Stale Credentials

PASS

An automated Kubernetes secret rotation job fails silently, leaving database credentials expired in 15 Kubernetes Secrets across 3 namespaces. Pods that restart or scale up pick up the expired credentials and cannot connect to databases. Running pods with cached credentials continue working until their connection pools recycle.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 92%Auto-Heal40 correlated

Docker Daemon Unresponsive — Container Operations Frozen

PASS

The Docker daemon on a production host becomes unresponsive due to a deadlock in the containerd shim layer. All container operations (start, stop, exec, logs) hang indefinitely. Running containers continue to operate but cannot be managed. New deployments and health checks fail.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 85%Remote Hands21 correlated

Kubernetes PersistentVolumeClaim Stuck Pending

PASS

Multiple PersistentVolumeClaims in a Kubernetes cluster are stuck in Pending state after the cloud provider's storage provisioner hits its volume limit. New StatefulSet pods cannot start because they require persistent storage. The storage class provisioner logs show quota exceeded errors.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Remote Hands21 correlated

Kubernetes Ingress Controller Crash — External Traffic Blocked

PASS

The NGINX Ingress Controller in a production Kubernetes cluster crashes after a malformed Ingress resource is applied. The controller enters a CrashLoopBackOff state. All external HTTP/HTTPS traffic to the cluster is blocked because no ingress controller pods are running to route traffic to backend services.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Auto-Heal28 correlated

GCP Cloud Spanner Regional Outage

PASS

A Google Cloud Spanner regional instance experiences a zone-level failure in us-central1-a. The multi-zone configuration should provide automatic failover, but a configuration error in the instance's node count causes the remaining zones to be overloaded. Read and write latencies spike above SLA thresholds.

CloudPattern: STORAGE_IO_LATENCYSeverity: CRITICALConfidence: 95%Remote Hands42 correlated

Azure Cosmos DB Throttling — RU/s Budget Exhausted

PASS

Azure Cosmos DB begins aggressively throttling requests after a marketing campaign drives 10x normal traffic. The provisioned RU/s budget is exhausted and autoscale max is reached. Applications receive HTTP 429 (Too Many Requests) responses. Retry storms amplify the problem as clients retry throttled requests.

CloudPattern: AZURE_CLOUDSeverity: CRITICALConfidence: 95%Remote Hands42 correlated

Azure Key Vault Unavailable — Secrets Rotation Blocked

PASS

Azure Key Vault becomes unreachable due to a misconfigured private endpoint and NSG rule change during a network security audit. All applications that fetch secrets, encryption keys, or certificates from Key Vault at startup or rotation time fail. Services that cache secrets continue working but cannot rotate credentials.

CloudPattern: AZURE_CLOUDSeverity: CRITICALConfidence: 95%Auto-Heal42 correlated

AWS EC2 Instance Store Loss — Ephemeral Data Gone

PASS

An EC2 instance with instance store (ephemeral) volumes experiences a hardware failure on the underlying host. AWS stops and restarts the instance on new hardware, but all instance store data is lost. The application had been incorrectly storing session data and temporary processing files on instance store volumes instead of EBS.

CloudPattern: AWS_CLOUDSeverity: CRITICALConfidence: 95%Remote Hands20 correlated

AWS S3 Bucket Policy Lockout — Data Inaccessible

PASS

A misconfigured S3 bucket policy denies all access including the root account. The bucket contains 15TB of production assets (user uploads, documents, media). All applications that read from or write to the bucket receive AccessDenied errors. Even the AWS console shows access denied.

CloudPattern: AWS_CLOUDSeverity: CRITICALConfidence: 95%Remote Hands28 correlated

AWS Lambda Cold Start Timeout Spike

PASS

After a Lambda function deployment, cold start times spike from 2 seconds to 28 seconds due to a new heavy SDK dependency. API Gateway returns 504 Gateway Timeout for cold-start invocations. Provisioned concurrency was removed to save costs last month.

CloudPattern: AWS_CLOUDSeverity: CRITICALConfidence: 85%Auto-Heal16 correlated

AWS RDS Multi-AZ Failover

PASS

An AWS RDS PostgreSQL Multi-AZ instance experiences a hardware failure in the primary AZ. Automatic failover to the standby in the secondary AZ triggers. Applications experience 60-120 seconds of downtime. DNS endpoint resolves to new primary.

CloudPattern: AWS_CLOUDSeverity: CRITICALConfidence: 95%Remote Hands34 correlated

Azure AD Sync Failure — Users Can't Authenticate

PASS

Azure AD Connect sync fails due to an expired service account password. Password hash sync stops working. New users created in on-prem AD are not provisioned in Azure AD. Existing cloud users with changed passwords cannot authenticate to M365 services.

CloudPattern: AZURE_CLOUDSeverity: CRITICALConfidence: 85%Remote Hands23 correlated

Azure App Service Outage — 502 Errors

PASS

An Azure App Service Plan hosting 3 production web apps starts returning 502 Bad Gateway errors after an Azure platform update. The apps intermittently crash with out-of-memory exceptions. Azure Status page shows degraded performance in East US 2 region.

CloudPattern: AZURE_CLOUDSeverity: CRITICALConfidence: 85%Auto-Heal35 correlated
PreviousPage 4 of 4

Every scenario is tested against Corax's Neural Engine in a production environment with AI-powered root cause analysis.

Tests run continuously as new infrastructure patterns are added.