PASSEDvendor / hpe_ilo_critical_alert

HPE ProLiant ILO Critical Alert — Thermal Emergency

An HPE ProLiant DL380 Gen10 Plus triggers a thermal emergency via ILO when the HVAC system fails in the server room, with CPU temperatures approaching the shutdown threshold.

Pattern

HIGH_CPU

Severity

CRITICAL

Confidence

95%

Remediation

Remote Hands

Test Results

Metric	Expected	Actual
Pattern Recognition	HIGH_CPU	HIGH_CPU
Severity Assessment	CRITICAL	CRITICAL
Incident Correlation	Yes	21 linked
Cascade Escalation	N/A	No
Remediation	—	Remote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

HPE ProLiant DL380 Gen10 Plus in on-prem server room. HVAC failure raised ambient temp to 42C. CPU package temps at 95C (shutdown at 100C). ILO reporting thermal critical. Server throttling heavily.

Injected Error Messages (2)

hpe ProLiant DL380 Gen10 Plus ILO critical thermal alert — CPU 1 temperature: 95C (critical threshold: 100C), CPU 2 temperature: 93C, ambient temperature: 42C (normal: 22C), hpe ILO health status: critical, server power capping engaged at 60% to reduce heat output, hpe ILO event: thermal emergency, automatic shutdown imminent if temperatures continue rising

hpe ProLiant DL380 application severely impacted — server CPU throttled to 60% power cap due to thermal emergency, application response latency increased from 50ms to 2.8 seconds, hpe ILO reporting all 6 fans at maximum RPM (18000), inlet temperature sensor: 42C, hpe server performance critically impacted by thermal throttling

Neural Engine Root Cause Analysis

The HPE DL380 Gen10 Plus server is experiencing a critical thermal emergency with CPU 1 at 95°C (near the 100°C critical threshold) and abnormally high ambient temperature of 42°C (normal: 22°C). The server has automatically engaged power capping at 60% to reduce heat output, but temperatures continue to rise toward automatic shutdown. This indicates either cooling system failure (fans, air conditioning), blocked airflow, or environmental issues in the data center.

Remediation Plan

1. Immediately check data center HVAC systems and ambient temperature controls 2. Verify server fan operation and speeds via ILO interface 3. Inspect server for dust buildup or blocked air vents 4. Check rack airflow patterns and cable management 5. If cooling systems are functional, consider controlled server shutdown to prevent hardware damage 6. Monitor other servers in the same rack/area for similar thermal issues 7. Contact facilities team for emergency cooling assessment 8. Once temperatures normalize, gradually restore power limits and monitor

Tested: 2026-03-30Monitors: 2 | Incidents: 2Test ID: cmnck6j3f0747obqelxn1bk6p