Back to All Scenarios
PASSEDnetwork / client_site_isp_failover_broken

Client Site ISP Failover Not Working

A managed client's secondary ISP failover fails to activate when the primary circuit goes down. The SD-WAN appliance detects the primary failure but the secondary circuit is disconnected due to an unpaid bill. The site is completely offline.

Pattern
TIMEOUT
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionTIMEOUTTIMEOUT
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes16 linked
Cascade EscalationN/ANo
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

Client site with dual ISP (primary: fiber, secondary: cable). Primary ISP circuit down (provider outage). Secondary ISP suspended for non-payment. SD-WAN detecting primary failure but failover target unavailable. Site completely offline. 50 users affected. VPN to HQ down.

Injected Error Messages (2)

Client site completely offline — primary ISP circuit down (provider confirmed outage), failover to secondary ISP failed: connection timeout on cable modem, secondary circuit suspended (billing issue), site unreachable on both WAN links, timeout on all connectivity tests to client site 203.0.113.50
Client site-to-HQ VPN tunnel down — both ISP circuits unavailable at client site, timeout on VPN keepalive probes, IKE re-establishment timeout after 300 seconds, 50 users at client site without network connectivity, timeout on all remote management attempts

Neural Engine Root Cause Analysis

This is a multi-layered WAN connectivity failure at the client site. The primary ISP circuit is confirmed down by the provider, which triggered a failover attempt to the secondary ISP connection. However, the secondary failover failed due to two critical issues: a connection timeout on the cable modem (likely hardware failure or provisioning issue) and circuit suspension due to billing problems. With both WAN links unavailable, the entire client site (203.0.113.50) is unreachable, causing cascading failures across 9 correlated incidents.

Remediation Plan

1. Immediately escalate billing issue with secondary ISP to restore suspended circuit. 2. Contact cable modem provider to diagnose timeout issue (likely requires technician dispatch for hardware replacement). 3. Monitor primary ISP restoration timeline and coordinate with provider. 4. Consider emergency backup connectivity (cellular hotspot, satellite) if site has critical operations. 5. Once any WAN link is restored, verify all dependent services and systems come back online. 6. Implement proper billing automation and redundant payment methods to prevent future suspension issues.
Tested: 2026-03-30Monitors: 2 | Incidents: 2Test ID: cmnck3c6n06f0obqe3no803a6