A rack PDU is running at 95% capacity after additional equipment was installed without proper power planning, and the overload alarm is triggering. Any additional load will trip the breaker.
Pattern
UNKNOWN
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands
Test Results
Metric
Expected
Actual
Result
Pattern Recognition
UNKNOWN
UNKNOWN
Severity Assessment
CRITICAL
CRITICAL
Incident Correlation
Yes
20 linked
Cascade Escalation
N/A
No
Remediation
—
Remote Hands — Corax contacts on-site support via call, email, or API
Scenario Conditions
Rack-B with dual PDU feeds (A+B). PDU-A at 95% load (19A of 20A). PDU-B at 88%. New GPU server added without power audit. PDU-A overload alarm triggered. Risk of breaker trip under any load increase.
Injected Error Messages (2)
rack PDU-A overload warning in Rack-B — current draw: 19.0A on 20A circuit (95% capacity), PDU overload alarm threshold breached (90%), PDU phase A approaching circuit breaker trip point, rack power budget exceeded after GPU server installation, any additional load or transient current increase will trigger breaker trip, 12 servers at risk of power loss if breaker trips
rack PDU-B elevated load in Rack-B — PDU-B current draw: 17.6A on 20A circuit (88% capacity), both rack PDU feeds above recommended 80% operating threshold, no headroom for failover: if PDU-A breaker trips, PDU-B cannot handle full rack load (would draw 36.6A on 20A circuit), dual PDU redundancy effectively compromised, immediate power rebalancing required
Neural Engine Root Cause Analysis
Rack-B PDU-A is operating at dangerous 95% capacity (19.0A on 20A circuit) due to recent GPU server installation that exceeded the rack's power budget. The PDU has breached its 90% overload alarm threshold and is approaching circuit breaker trip point, putting 12 servers at risk of simultaneous power loss. This is a physical infrastructure capacity issue, not a monitoring system failure - the SNMP monitor is correctly reporting the critical power condition.
Remediation Plan
1. IMMEDIATE: Contact datacenter operations to assess power redistribution options 2. Identify non-critical servers in Rack-B that can be safely powered down to reduce load below 90% threshold 3. Evaluate moving some servers to alternative racks with available power capacity 4. Consider installing additional PDU or upgrading circuit capacity 5. Implement power monitoring alerts at lower thresholds (80%) to prevent future overload situations 6. Review and update rack power budgeting procedures before future equipment installations