We test Corax against real-world infrastructure failures across every vendor, platform, and scenario. Browse the results below.
The NGINX Ingress Controller in a production Kubernetes cluster crashes after a malformed Ingress resource is applied. The controller enters a CrashLoopBackOff state. All external HTTP/HTTPS traffic to the cluster is blocked because no ingress controller pods are running to route traffic to backend services.
A Google Cloud Spanner regional instance experiences a zone-level failure in us-central1-a. The multi-zone configuration should provide automatic failover, but a configuration error in the instance's node count causes the remaining zones to be overloaded. Read and write latencies spike above SLA thresholds.
Azure Cosmos DB begins aggressively throttling requests after a marketing campaign drives 10x normal traffic. The provisioned RU/s budget is exhausted and autoscale max is reached. Applications receive HTTP 429 (Too Many Requests) responses. Retry storms amplify the problem as clients retry throttled requests.
Azure Key Vault becomes unreachable due to a misconfigured private endpoint and NSG rule change during a network security audit. All applications that fetch secrets, encryption keys, or certificates from Key Vault at startup or rotation time fail. Services that cache secrets continue working but cannot rotate credentials.
An EC2 instance with instance store (ephemeral) volumes experiences a hardware failure on the underlying host. AWS stops and restarts the instance on new hardware, but all instance store data is lost. The application had been incorrectly storing session data and temporary processing files on instance store volumes instead of EBS.
A misconfigured S3 bucket policy denies all access including the root account. The bucket contains 15TB of production assets (user uploads, documents, media). All applications that read from or write to the bucket receive AccessDenied errors. Even the AWS console shows access denied.
After a Lambda function deployment, cold start times spike from 2 seconds to 28 seconds due to a new heavy SDK dependency. API Gateway returns 504 Gateway Timeout for cold-start invocations. Provisioned concurrency was removed to save costs last month.
An AWS RDS PostgreSQL Multi-AZ instance experiences a hardware failure in the primary AZ. Automatic failover to the standby in the secondary AZ triggers. Applications experience 60-120 seconds of downtime. DNS endpoint resolves to new primary.
Azure AD Connect sync fails due to an expired service account password. Password hash sync stops working. New users created in on-prem AD are not provisioned in Azure AD. Existing cloud users with changed passwords cannot authenticate to M365 services.
An Azure App Service Plan hosting 3 production web apps starts returning 502 Bad Gateway errors after an Azure platform update. The apps intermittently crash with out-of-memory exceptions. Azure Status page shows degraded performance in East US 2 region.
Every scenario is tested against Corax's Neural Engine in a production environment with AI-powered root cause analysis.
Tests run continuously as new infrastructure patterns are added.