Back to All Scenarios
PASSEDnetwork / lacp_bond_failure

802.3ad LACP Bond Failure — Port Channel Down

An LACP port channel between the core switch and server farm switch loses all member links after a switch firmware bug causes LACP PDU processing to fail, severing connectivity for 50 servers.

Pattern
UNKNOWN
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionUNKNOWNUNKNOWN
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes29 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

4-member LACP port channel (4x10Gbps = 40Gbps aggregate). Core switch firmware bug stops LACP PDU transmission. All 4 member ports go to suspended state. Port channel goes down. 50 servers isolated from core.

Injected Error Messages (3)

LACP port-channel 10 failure on Core-SW1 — all 4 member interfaces (Te1/0/1-4) in LACP suspended state, 802.3ad LACP PDU transmission halted after firmware bug, port-channel aggregate link: down, 40Gbps aggregate bandwidth to server farm: lost, LACP partner (Server-Farm-SW1) not receiving PDUs, LACP activity expired on all member ports
server farm switch port-channel partner down — LACP port-channel 10 partner (Core-SW1) stopped sending LACP PDUs, 802.3ad bond went from 4 active members to 0, all member ports transitioned to suspended after 90-second LACP long expiry, server farm switch lost all uplink paths to core routing, 50 servers isolated from rest of network
server farm gateway unreachable — LACP bond failure between core and server farm switches severed all Layer 3 connectivity, 50 servers unable to reach default gateway 10.10.100.1, server-to-server traffic within farm: functional, but no traffic can reach clients or internet, all server applications showing connection failures to external dependencies

Neural Engine Root Cause Analysis

A firmware bug on Core-SW1 has caused LACP PDU transmission to halt on port-channel 10, resulting in all 4 member interfaces (Te1/0/1-4) entering a suspended state. This has severed the 40Gbps aggregate link to the server farm, with the LACP partner (Server-Farm-SW1) no longer receiving keepalive PDUs. The presence of 12 correlated incidents suggests this is causing widespread connectivity issues downstream, likely affecting all services dependent on the server farm connectivity.

Remediation Plan

1. Immediately contact network engineering team for emergency response. 2. Attempt to restart LACP on port-channel 10 via CLI commands (shutdown/no shutdown on port-channel interface). 3. If restart fails, disable and re-enable individual member interfaces Te1/0/1-4. 4. Verify LACP neighbor relationship with Server-Farm-SW1. 5. If issue persists, plan emergency firmware upgrade or hardware replacement. 6. Implement temporary failover routing if backup paths exist. 7. Monitor all 12 correlated incidents for resolution as connectivity is restored.
Tested: 2026-03-30Monitors: 3 | Incidents: 3Test ID: cmnckayyk083yobqexybmpvlw