Infrastructure Scenario Tests

We test Corax against real-world infrastructure failures across every vendor, platform, and scenario. Browse the results below.

276
Total Tests
100.0%
Pass Rate
276
Passed
0
Failed

Container Registry Rate Limit Exceeded — Deployments Blocked

PASS

A Kubernetes cluster's nodes are all pulling images from Docker Hub simultaneously during a rolling deployment, exceeding the Docker Hub rate limit (100 pulls/6 hours for anonymous, 200 for authenticated). All image pulls fail with 429 Too Many Requests, blocking deployments and pod restarts across the cluster.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Auto-Heal18 correlated

K8s Network Policy Blocking Inter-Pod Communication

PASS

A newly applied Kubernetes NetworkPolicy with an overly restrictive ingress rule blocks all traffic between the API pods and the database pods. The policy was intended to restrict external access but inadvertently blocks intra-cluster communication, causing a complete application outage.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Auto-Heal22 correlated

Kubernetes HPA Autoscaling Failure — Metrics Server Down

PASS

The Kubernetes metrics-server deployment is crashlooping due to a misconfigured TLS flag after a Helm chart upgrade. Without metrics-server, the Horizontal Pod Autoscaler cannot read CPU/memory metrics and stops scaling pods. During a traffic surge, pods remain at minimum replica count and become overwhelmed.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Auto-Heal22 correlated

Multi-Cloud DNS Failover Failure — Both Providers Down

PASS

A multi-cloud architecture uses DNS-based failover between primary (cloud provider A) and secondary (cloud provider B). The DNS failover mechanism itself fails because the health check endpoint uses a shared authentication service that is down on both clouds, causing the DNS provider to mark both targets as unhealthy.

CloudPattern: UNKNOWNSeverity: CRITICALConfidence: 80%Remote Hands30 correlated

CDN Origin Shield Failure — Origin Server Overwhelmed

PASS

The CDN origin shield layer fails, causing all 50+ edge locations to simultaneously request content directly from the origin server. The origin is designed to handle coalesced requests from 1 shield, not direct requests from 50+ edges. Origin collapses under the load.

CloudPattern: UNKNOWNSeverity: CRITICALConfidence: 85%Remote Hands27 correlated

CDN Cache Poisoning — Serving Malicious Content to Users

PASS

An attacker exploits an unkeyed header vulnerability to poison the CDN cache, causing all users requesting a specific page to receive a response containing injected malicious JavaScript. The poisoned cache entry has a 24-hour TTL and is replicated across all edge locations.

CloudPattern: UNKNOWNSeverity: CRITICALConfidence: 95%Remote Hands20 correlated

GCP Pub/Sub Dead Letter Queue Full — Message Processing Failure

PASS

A GCP Pub/Sub subscription's dead letter topic accumulates millions of unprocessable messages after a schema change breaks the consumer application. The dead letter queue has no consumer, messages are piling up, and the original subscription's backlog is growing exponentially.

CloudPattern: UNKNOWNSeverity: CRITICALConfidence: 85%Remote Hands21 correlated

GCP Load Balancer Health Check Failure — All Backends Unhealthy

PASS

A GCP HTTP(S) load balancer marks all backend instances as unhealthy after a firewall rule change blocks the health check probe source IP range (35.191.0.0/16). The load balancer returns HTTP errors to all clients despite all backend VMs being fully operational.

CloudPattern: FIREWALL_RULE_BLOCKSeverity: CRITICALConfidence: 92%Auto-Heal18 correlated

Azure Front Door Routing Error — Backend Pool Misconfiguration

PASS

An Azure Front Door configuration update introduces a routing rule error that sends 40% of production traffic to a staging backend pool. Users intermittently see staging data mixed with production data, causing data integrity concerns and customer confusion.

CloudPattern: AZURE_CLOUDSeverity: CRITICALConfidence: 95%Auto-Heal18 correlated

Azure DevOps Pipeline Failure — Build Agent Pool Exhausted

PASS

All Azure DevOps self-hosted build agents are stuck on hung builds, preventing any new CI/CD pipelines from running. The agent pool shows 0 available agents. Development velocity drops to zero as no code can be built, tested, or deployed.

CloudPattern: AZURE_CLOUDSeverity: CRITICALConfidence: 85%Remote Hands18 correlated

AWS Route 53 Health Check Cascade — Multi-Region Failover Storm

PASS

A misconfigured Route 53 health check threshold causes all three regional endpoints to be marked unhealthy simultaneously during a brief network blip. Route 53 removes all records from DNS, causing a complete global outage even though all regions are actually healthy.

CloudPattern: AWS_CLOUDSeverity: CRITICALConfidence: 85%Auto-Heal28 correlated

AWS ECS Task Placement Failure — Insufficient Resources

PASS

ECS service cannot place new tasks because all container instances in the cluster have exhausted their CPU and memory reservations. Auto-scaling group is at max capacity. Deployments are stuck with desired count never matching running count.

CloudPattern: AWS_CLOUDSeverity: CRITICALConfidence: 90%Auto-Heal21 correlated

Ansible Playbook Failure — Configuration Drift Across Fleet

PASS

An Ansible playbook run against 200 production servers fails midway through execution due to a changed SSH host key on the jump host. 87 servers received the updated configuration while 113 did not, creating a split-brain configuration state across the fleet.

CloudPattern: FIREWALL_RULE_BLOCKSeverity: CRITICALConfidence: 85%Remote Hands21 correlated

CloudFormation Stack Rollback — Production Update Failed

PASS

A CloudFormation stack update to production fails during resource creation, triggering an automatic rollback that itself gets stuck in UPDATE_ROLLBACK_FAILED state due to a manually modified resource outside of CloudFormation control.

CloudPattern: AWS_CLOUDSeverity: CRITICALConfidence: 95%Remote Hands22 correlated

Terraform State Lock Conflict — Parallel Apply Blocking Deployments

PASS

Two CI/CD pipelines triggered simultaneously attempt to run terraform apply against the same state file. The DynamoDB state lock prevents both from proceeding, but a stale lock from a crashed previous run is never released, blocking all infrastructure deployments for 4 hours.

CloudPattern: UNKNOWNSeverity: CRITICALConfidence: 95%Auto-Heal21 correlated

Azure AD Connect Sync Loop

PASS

Azure AD Connect enters a synchronization loop where the delta sync cycle never completes, continuously restarting. Password hash synchronization stops working, and on-premises changes are not reflecting in Azure AD.

CloudPattern: AZURE_CLOUDSeverity: CRITICALConfidence: 85%Remote Hands5 correlated

Helm Chart Rollback Failure — Stuck Between Versions

PASS

A Helm chart upgrade fails partway through due to a resource conflict, and the automatic rollback also fails because the previous release's CRDs are incompatible with the current cluster state. The release is stuck in a 'pending-rollback' state. Kubernetes resources are in a mixed state between the old and new versions.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Remote Hands29 correlated

Kubernetes Secret Rotation Failure — Stale Credentials

PASS

An automated Kubernetes secret rotation job fails silently, leaving database credentials expired in 15 Kubernetes Secrets across 3 namespaces. Pods that restart or scale up pick up the expired credentials and cannot connect to databases. Running pods with cached credentials continue working until their connection pools recycle.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 92%Auto-Heal40 correlated

Docker Daemon Unresponsive — Container Operations Frozen

PASS

The Docker daemon on a production host becomes unresponsive due to a deadlock in the containerd shim layer. All container operations (start, stop, exec, logs) hang indefinitely. Running containers continue to operate but cannot be managed. New deployments and health checks fail.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 85%Remote Hands21 correlated

Kubernetes PersistentVolumeClaim Stuck Pending

PASS

Multiple PersistentVolumeClaims in a Kubernetes cluster are stuck in Pending state after the cloud provider's storage provisioner hits its volume limit. New StatefulSet pods cannot start because they require persistent storage. The storage class provisioner logs show quota exceeded errors.

CloudPattern: CONTAINER_EVENTSeverity: CRITICALConfidence: 95%Remote Hands21 correlated
Page 1 of 2Next

Every scenario is tested against Corax's Neural Engine in a production environment with AI-powered root cause analysis.

Tests run continuously as new infrastructure patterns are added.