We test Corax against real-world infrastructure failures across every vendor, platform, and scenario. Browse the results below.
An Azure managed disk hit its IOPS limit causing severe IO latency for the hosted application.
Azure Application Gateway returning 502 because all backends were removed from the pool during a scale-in event.
A critical VPC firewall rule allowing HTTPS traffic was deleted during a Terraform destroy of a staging environment. Production affected.
A GCP Pub/Sub subscription has a growing message backlog because all subscriber instances were terminated during a GKE node upgrade.
Azure Functions on Consumption plan experiencing 30+ second cold starts during a traffic spike. HTTP trigger functions timing out.
Shared Access Signature tokens for blob storage have expired. Application cannot read or write to storage containers.
Azure Cache for Redis is at 95% memory and actively evicting keys. Application cache hit rate collapsed.
A GCP Cloud Run service deployed a new revision that crashes on startup. All traffic routing to the failing revision.
IAM permissions on a GCS bucket were changed, blocking the application's service account from reading objects.
GCP Memorystore for Redis reached its maxmemory limit. Write operations failing with OOM errors.
A GCP Cloud SQL instance hit its auto-storage increase limit. Database writes failing. Disk cannot grow further.
All targets in an ALB target group are failing health checks. ALB returning 502 to all clients. Health check path changed during deployment.
An Azure Key Vault access policy was accidentally removed during a Terraform apply. Application cannot retrieve secrets.
Azure SQL geo-replication is lagging by 15 minutes due to high transaction volume. The failover group RPO guarantee is being violated.
An ECS Fargate service cannot start any tasks because the task memory reservation exceeds the available capacity.
AKS cluster autoscaler cannot add nodes because the Azure subscription hit its vCPU quota limit. Pods pending with no capacity.
An AWS NAT Gateway is experiencing ErrorPortAllocation events. Private subnet instances cannot make outbound connections.
An Azure VM was migrated during Azure maintenance. The temporary disk (D:) was reassigned and all data on it was lost, including TempDB and page files.
An EC2 instance with instance store volumes was stopped and started. All ephemeral data lost including application state and temp files.
An EC2 instance became unreachable after a security group rule was accidentally removed. SSH access also blocked.
Every scenario is tested against Corax's Neural Engine in a production environment with AI-powered root cause analysis.
Tests run continuously as new infrastructure patterns are added.