Back to All Scenarios
PASSEDcloud / cdn_origin_shield_failure

CDN Origin Shield Failure — Origin Server Overwhelmed

The CDN origin shield layer fails, causing all 50+ edge locations to simultaneously request content directly from the origin server. The origin is designed to handle coalesced requests from 1 shield, not direct requests from 50+ edges. Origin collapses under the load.

Pattern
UNKNOWN
Severity
CRITICAL
Confidence
85%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionUNKNOWNUNKNOWN
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes27 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

CDN with origin shield enabled. Origin shield server crashes. 50+ edge POPs all fall back to direct origin requests. Origin server capacity: 5000 req/sec. Incoming load after shield failure: 85000 req/sec. Origin overwhelmed and returning errors.

Injected Error Messages (3)

CDN origin shield node unresponsive — shield server in us-east-1 failed, all edge locations losing shield connection and falling back to direct-to-origin mode, origin request multiplier increased from 1x to 50x, shield failover did not activate, edge nodes now sending all cache-miss requests directly to origin server
origin server overwhelmed — request rate jumped from 800/sec (via shield) to 85000/sec (direct from all edge POPs), connection pool exhausted, cpu high at 100%, memory pressure critical, application returning econnrefused to CDN edge nodes, origin capacity is 5000 req/sec max
global CDN outage — all edge locations receiving errors from origin, cache miss rate: 35%, all cache misses returning error pages to end users, CDN serving stale content where available but fresh requests all failing, global user impact across all 50+ regions

Neural Engine Root Cause Analysis

The CDN origin shield server in us-east-1 has failed completely, becoming unresponsive and unable to serve as a caching layer between edge nodes and the origin server. This has triggered a catastrophic failover failure where the shield failover mechanism did not activate properly, forcing all edge locations to bypass the shield and send cache-miss requests directly to the origin server. This has increased the origin request multiplier from 1x to 50x, creating a massive load amplification that could overwhelm the origin infrastructure and cascade into broader service degradation.

Remediation Plan

1. Immediately activate manual failover to backup shield nodes in alternate regions (us-west-1 or eu-west-1). 2. Investigate why automatic shield failover did not trigger and temporarily disable the failed us-east-1 shield node. 3. Monitor origin server performance and implement emergency rate limiting if needed to prevent origin overload. 4. Verify shield node health checks and failover mechanisms across all regions. 5. Once alternate shield is active, gradually re-route traffic and monitor for request multiplier reduction back to normal levels. 6. Conduct post-incident analysis on shield failover automation failure.
Tested: 2026-03-30Monitors: 3 | Incidents: 3Test ID: cmnckd80g08mmobqejmpul2nd