Back to All Scenarios
PASSEDvendor / fortinet_ha_failover

FortiGate HA Cluster Failover

The primary FortiGate in an HA pair crashes due to a firmware bug, triggering failover to the secondary unit. All active VPN tunnels drop and need to re-establish.

Pattern
FORTINET_EVENT
Severity
CRITICAL
Confidence
85%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionFORTINET_EVENTFORTINET_EVENT
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes37 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

FortiGate 600F HA pair (Active-Passive). Primary unit kernel panic. 12 IPSec VPN tunnels to branch offices. 500 SSL-VPN users connected. FortiGuard services active.

Injected Error Messages (4)

FortiGate HA failover detected — primary unit fw01 unresponsive, secondary fw02 assuming active role, Fortinet HA cluster state change, all sessions being migrated
VPN tunnel down to Branch-Miami — FortiGate HA failover caused IPSec SA rekey failure, tunnel re-establishing on secondary unit
VPN tunnel down to Branch-Chicago — FortiGate HA failover, IKE Phase 1 renegotiation in progress, branch office connectivity lost
FortiGate SSL-VPN portal unreachable during HA failover, 500 active sessions dropped, Fortinet cluster transition in progress

Neural Engine Root Cause Analysis

The primary FortiGate firewall unit (fw01) has become unresponsive, triggering an automatic HA failover to the secondary unit (fw02). This is likely caused by a hardware failure, power issue, network connectivity loss, or system crash on the primary unit. The 13 correlated incidents within the same timeframe suggest this firewall failure caused a cascade of connectivity issues across dependent services and applications that rely on fw01 for network access.

Remediation Plan

1. Verify fw02 has successfully assumed the active role and traffic is flowing normally. 2. Attempt to access fw01 via out-of-band management (console/IPMI) to determine if it's completely offline or just unresponsive. 3. Check fw01 power status, network cables, and hardware indicators. 4. If fw01 is accessible, review system logs for crash dumps, memory errors, or hardware failures. 5. Attempt to restart fw01 services or perform a controlled reboot if safe to do so. 6. Monitor all correlated incidents to ensure they resolve once fw01 is restored or fw02 stabilizes. 7. Once fw01 is restored, re-establish HA synchronization and return to normal active/passive state.
Tested: 2026-03-30Monitors: 4 | Incidents: 4Test ID: cmncjcq6d00bnobqeky1bbd2d