An HPE ProLiant DL380 Gen10 Plus triggers a thermal emergency via ILO when the HVAC system fails in the server room, with CPU temperatures approaching the shutdown threshold.
Pattern
HIGH_CPU
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands
Test Results
Metric
Expected
Actual
Result
Pattern Recognition
HIGH_CPU
HIGH_CPU
Severity Assessment
CRITICAL
CRITICAL
Incident Correlation
Yes
21 linked
Cascade Escalation
N/A
No
Remediation
—
Remote Hands — Corax contacts on-site support via call, email, or API
Scenario Conditions
HPE ProLiant DL380 Gen10 Plus in on-prem server room. HVAC failure raised ambient temp to 42C. CPU package temps at 95C (shutdown at 100C). ILO reporting thermal critical. Server throttling heavily.
Injected Error Messages (2)
hpe ProLiant DL380 Gen10 Plus ILO critical thermal alert — CPU 1 temperature: 95C (critical threshold: 100C), CPU 2 temperature: 93C, ambient temperature: 42C (normal: 22C), hpe ILO health status: critical, server power capping engaged at 60% to reduce heat output, hpe ILO event: thermal emergency, automatic shutdown imminent if temperatures continue rising
hpe ProLiant DL380 application severely impacted — server CPU throttled to 60% power cap due to thermal emergency, application response latency increased from 50ms to 2.8 seconds, hpe ILO reporting all 6 fans at maximum RPM (18000), inlet temperature sensor: 42C, hpe server performance critically impacted by thermal throttling
Neural Engine Root Cause Analysis
The HPE DL380 Gen10 Plus server is experiencing a critical thermal emergency with CPU 1 at 95°C (near the 100°C critical threshold) and abnormally high ambient temperature of 42°C (normal: 22°C). The server has automatically engaged power capping at 60% to reduce heat output, but temperatures continue to rise toward automatic shutdown. This indicates either cooling system failure (fans, air conditioning), blocked airflow, or environmental issues in the data center.
Remediation Plan
1. Immediately check data center HVAC systems and ambient temperature controls 2. Verify server fan operation and speeds via ILO interface 3. Inspect server for dust buildup or blocked air vents 4. Check rack airflow patterns and cable management 5. If cooling systems are functional, consider controlled server shutdown to prevent hardware damage 6. Monitor other servers in the same rack/area for similar thermal issues 7. Contact facilities team for emergency cooling assessment 8. Once temperatures normalize, gradually restore power limits and monitor