Back to All Scenarios
PASSEDinfrastructure / aruba_ap_cluster_failure

Aruba AP Cluster Failure — Campus WiFi Outage

The Aruba wireless controller cluster loses sync after a firmware mismatch between the primary and standby controllers. All APs managed by the failed controller go into standalone mode with degraded functionality. Roaming between controller zones fails completely.

Pattern
WIRELESS_CONTROLLER
Severity
CRITICAL
Confidence
92%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionWIRELESS_CONTROLLERWIRELESS_CONTROLLER
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes29 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

Aruba 7210 controller cluster (primary + standby). Primary upgraded to 8.11, standby still on 8.10. Cluster sync failed. 120 APs on primary controller entering standalone mode. Campus with 3 buildings. 800+ wireless users.

Injected Error Messages (3)

Wireless controller cluster failure — Aruba WLC-01 cluster sync lost with standby, firmware version mismatch (8.11 vs 8.10), wireless controller primary entering degraded mode, 120 APs losing management connection
Standby wireless controller unable to take over — cluster heartbeat lost, wireless controller standby rejecting AP join requests due to firmware incompatibility, failover failed, manual intervention required
Building-A APs in standalone mode — wireless controller management lost, 40 APs operating with cached config, no new client authentication, roaming between buildings broken, RADIUS proxy through controller unavailable

Neural Engine Root Cause Analysis

The Aruba WLC primary controller (wlc-01) has lost cluster synchronization with its standby due to a firmware version mismatch between the primary (8.11) and standby (8.10) controllers. This mismatch is preventing proper high-availability operations and causing the primary to enter degraded mode, resulting in loss of management connectivity to 120 access points. The 12 correlated incidents likely represent the downstream impact of individual APs or network segments losing wireless connectivity.

Remediation Plan

1. Immediately assess if standby controller is accessible and functional 2. Determine which firmware version should be the target (likely 8.11 to match primary) 3. Plan maintenance window as firmware upgrade will cause temporary service disruption 4. Backup current configurations on both controllers 5. Upgrade standby controller firmware to match primary (8.11) 6. Re-establish cluster synchronization between controllers 7. Verify all 120 APs reconnect and show online status 8. Monitor cluster health and AP connectivity for 30 minutes post-remediation
Tested: 2026-03-30Monitors: 3 | Incidents: 3Test ID: cmncjq8l103piobqez18ffn3r