Back to All Scenarios
PASSEDvendor / juniper_srx_cluster_failover

Juniper SRX Cluster Failover — Chassis Cluster Split

A Juniper SRX4600 chassis cluster experiences a split-brain condition when both fabric links fail simultaneously, causing both nodes to assume the primary role and creating duplicate gateways on the network.

Pattern
UNKNOWN
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionUNKNOWNUNKNOWN
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes26 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

Juniper SRX4600 HA chassis cluster. Both fab0 and fab1 links failed (switch port err-disable). Both nodes claiming primary. Duplicate default gateways. 1000 users affected.

Injected Error Messages (3)

juniper SRX4600 chassis cluster split-brain detected — node0 (primary) lost fabric links fab0 and fab1 to node1, juniper cluster redundancy group 0 status: primary on BOTH nodes, duplicate IP addresses on reth interfaces causing ARP conflicts, juniper SRX cluster health: critical, dual-primary condition active
juniper SRX4600 node1 assumed primary role after fabric link loss — chassis cluster split detected, juniper redundancy group failover triggered on both nodes simultaneously, both SRX nodes advertising same virtual MAC and IP, network experiencing duplicate gateway condition, 1000 users with intermittent connectivity
juniper SRX chassis cluster dual-primary causing network instability — duplicate default gateway 10.0.0.254 from two SRX nodes, ARP table oscillating between two MAC addresses, juniper reth interfaces active on both cluster members, packet forwarding inconsistent, network gateway unreliable

Neural Engine Root Cause Analysis

The Juniper SRX4600 chassis cluster is experiencing a split-brain condition where both nodes believe they are primary due to complete failure of fabric links fab0 and fab1 between node0 and node1. This network partition has caused both nodes to assume primary role simultaneously, leading to duplicate IP addresses on reth interfaces and ARP conflicts. The 12 correlated incidents suggest widespread network impact as traffic routing becomes unpredictable with two conflicting primary nodes advertising the same network resources.

Remediation Plan

1. Immediately isolate one node (preferably node1) to prevent dual-primary conflicts by powering it down or disconnecting its data plane interfaces. 2. Verify fabric link physical connectivity - check cables, transceivers, and switch ports for fab0 and fab1 connections. 3. Replace any faulty fabric link components identified. 4. Once fabric links are restored and verified, gracefully reintroduce the secondary node ensuring it properly assumes secondary role. 5. Monitor cluster status until both nodes show healthy primary/secondary relationship. 6. Clear any residual ARP cache issues on connected devices.
Tested: 2026-03-30Monitors: 3 | Incidents: 3Test ID: cmnck561f06spobqeb9gd5aq2