Back to All Scenarios
PASSEDcloud / multi_cloud_dns_failover_failure

Multi-Cloud DNS Failover Failure — Both Providers Down

A multi-cloud architecture uses DNS-based failover between primary (cloud provider A) and secondary (cloud provider B). The DNS failover mechanism itself fails because the health check endpoint uses a shared authentication service that is down on both clouds, causing the DNS provider to mark both targets as unhealthy.

Pattern
UNKNOWN
Severity
CRITICAL
Confidence
80%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionUNKNOWNUNKNOWN
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes30 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

Primary on cloud provider A, secondary on cloud provider B. DNS failover via external DNS provider. Health check endpoint requires auth service. Auth service down on both clouds due to shared IdP outage. DNS provider removes both targets. Complete outage.

Injected Error Messages (3)

primary cloud endpoint health check returning authentication errors — health endpoint at primary.company.com/health requires valid auth token, shared identity provider unreachable, health check probe receiving 'unauthorized' response, DNS failover provider marking primary as unhealthy even though application infrastructure is fully operational
secondary cloud endpoint also failing health checks — same authentication dependency as primary, shared identity provider outage affecting both cloud environments simultaneously, DNS failover provider now has zero healthy targets, attempting to failover but no valid destination available
dns failover service critical — both primary and secondary targets marked unhealthy, enotfound for app.company.com, complete multi-cloud failover failure, single point of failure in shared authentication dependency exposed, all dns resolution failed for production domain

Neural Engine Root Cause Analysis

The primary cloud endpoint health check is failing due to authentication errors caused by an unreachable shared identity provider. While the application infrastructure remains fully operational, the health check endpoint requires valid auth tokens which cannot be obtained due to the identity provider outage. This is causing DNS failover providers to incorrectly mark the primary endpoint as unhealthy, potentially triggering unnecessary failover cascades despite the core application being functional.

Remediation Plan

1. Verify identity provider status and connectivity from application servers 2. Check network connectivity and DNS resolution to identity provider endpoints 3. Examine authentication service logs for specific error patterns 4. If identity provider is down, contact provider support or implement temporary bypass for health checks 5. Consider configuring health check endpoint to use alternative authentication method or create unauthenticated health endpoint 6. Update DNS failover configuration to use more resilient health check methodology
Tested: 2026-03-30Monitors: 3 | Incidents: 3Test ID: cmnckdmc508qhobqe6b59emth