PASSEDvendor / fortinet_ha_failover

FortiGate HA Cluster Failover

The primary FortiGate in an HA pair crashes due to a firmware bug, triggering failover to the secondary unit. All active VPN tunnels drop and need to re-establish.

Pattern

FORTINET_EVENT

Severity

CRITICAL

Confidence

85%

Remediation

Remote Hands

Test Results

Metric	Expected	Actual
Pattern Recognition	FORTINET_EVENT	FORTINET_EVENT
Severity Assessment	CRITICAL	CRITICAL
Incident Correlation	Yes	37 linked
Cascade Escalation	Yes	Yes
Remediation	—	Remote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

FortiGate 600F HA pair (Active-Passive). Primary unit kernel panic. 12 IPSec VPN tunnels to branch offices. 500 SSL-VPN users connected. FortiGuard services active.

Injected Error Messages (4)

FortiGate HA failover detected — primary unit fw01 unresponsive, secondary fw02 assuming active role, Fortinet HA cluster state change, all sessions being migrated

VPN tunnel down to Branch-Miami — FortiGate HA failover caused IPSec SA rekey failure, tunnel re-establishing on secondary unit

VPN tunnel down to Branch-Chicago — FortiGate HA failover, IKE Phase 1 renegotiation in progress, branch office connectivity lost

FortiGate SSL-VPN portal unreachable during HA failover, 500 active sessions dropped, Fortinet cluster transition in progress

Neural Engine Root Cause Analysis

The primary FortiGate firewall unit (fw01) has become unresponsive, triggering an automatic HA failover to the secondary unit (fw02). This is likely caused by a hardware failure, power issue, network connectivity loss, or system crash on the primary unit. The 13 correlated incidents within the same timeframe suggest this firewall failure caused a cascade of connectivity issues across dependent services and applications that rely on fw01 for network access.

Remediation Plan

1. Verify fw02 has successfully assumed the active role and traffic is flowing normally. 2. Attempt to access fw01 via out-of-band management (console/IPMI) to determine if it's completely offline or just unresponsive. 3. Check fw01 power status, network cables, and hardware indicators. 4. If fw01 is accessible, review system logs for crash dumps, memory errors, or hardware failures. 5. Attempt to restart fw01 services or perform a controlled reboot if safe to do so. 6. Monitor all correlated incidents to ensure they resolve once fw01 is restored or fw02 stabilizes. 7. Once fw01 is restored, re-establish HA synchronization and return to normal active/passive state.

Tested: 2026-03-30Monitors: 4 | Incidents: 4Test ID: cmncjcq6d00bnobqeky1bbd2d