Back to All Scenarios
PASSEDcascade / hvac_failure

HVAC Failure in Server Room — Temperature Rising

The server room CRAC (Computer Room Air Conditioning) unit fails at 2AM. Temperature rises from 72F to 95F in 45 minutes. Server thermal throttling begins. If temperature reaches 104F, automatic thermal shutdown will occur on all servers.

Pattern
UNKNOWN
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionUNKNOWNUNKNOWN
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes36 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

Single CRAC unit (no redundancy). Server room: 12 rack units. Current temp: 95F (rising 0.5F/min). Thermal throttle threshold: 85F. Emergency shutdown threshold: 104F. 18 minutes until shutdown threshold at current rate.

Injected Error Messages (3)

CRAC unit failure — APC InRow cooling unit offline, alarm coolingUnitFailed, compressor fault code: F7 (refrigerant pressure low), supply air temperature: 91F (setpoint: 64F), server room ambient: 95F and rising, no redundant cooling unit, estimated time to critical: 18 minutes
Server room temperature CRITICAL — rack inlet temperature 95F (threshold: 85F), rate of rise: 0.5F per minute, hot aisle temperature: 112F, environmental sensor alarm: HIGH TEMP WARNING, all rack units at risk of thermal shutdown at 104F
Host thermal throttling — server esxi-01 CPU frequency reduced to 1.8GHz (base: 3.2GHz) due to thermal limit, IPMI sensor: CPU1 Temp 94C (threshold: 100C), VM performance impact 40%, thermal throttle engaged, host at risk of emergency shutdown

Neural Engine Root Cause Analysis

The CRAC (Computer Room Air Conditioning) unit has suffered a compressor failure with fault code F7 indicating low refrigerant pressure, likely due to a refrigerant leak or compressor mechanical failure. This is a hardware failure of the cooling system's compressor, not a software or network issue. The unit cannot maintain the required cooling capacity, causing server room temperature to rise from 64°F setpoint to 95°F ambient with supply air at 91°F. With no redundant cooling and only 18 minutes to critical temperature thresholds, this represents an imminent threat to all datacenter equipment.

Remediation Plan

1. IMMEDIATE: Dispatch HVAC technician for emergency service call to diagnose compressor and refrigerant system. 2. IMMEDIATE: Activate emergency cooling measures (portable AC units, increase facility HVAC, open cold aisles). 3. IMMEDIATE: Prepare for controlled shutdown of non-critical systems if temperature continues rising. 4. SHORT-TERM: Technician to check refrigerant levels, inspect for leaks, test compressor functionality, and repair/replace failed components. 5. LONG-TERM: Implement redundant cooling capacity and enhanced monitoring to prevent single points of failure.
Tested: 2026-03-30Monitors: 3 | Incidents: 3Test ID: cmncjo4aj033wobqe3h6dqcp5