Back to All Scenarios
PASSEDvendor / esxi_psod

ESXi Host Purple Screen of Death

An ESXi 8.0 host experiences a Purple Screen of Death (PSOD) due to a faulty network driver. All 35 VMs on the host crash simultaneously. HA attempts to restart them on surviving hosts but resource contention causes slow recovery.

Pattern
VMWARE_EVENT
Severity
CRITICAL
Confidence
92%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionVMWARE_EVENTVMWARE_EVENT
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes37 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

VMware vSphere 8.0 cluster with 4 ESXi hosts. Host esxi-03 PSOD. 35 VMs affected. HA enabled with 'Restart VMs' policy. Remaining 3 hosts at 75% memory utilization. DRS enabled in fully automated mode.

Injected Error Messages (4)

VMware ESXi host esxi-03 PSOD — Purple Screen of Death, host unreachable, PF Exception 14 in world hostd, backtrace in vmkernel log: #PF at vmk_IscsiTransportSendPdu+0x1a2, host powered off, 35 VMs affected
vCenter HA event: host esxi-03 failure detected — VMware HA initiating VM restart on surviving hosts, admission control: 3 VMs cannot be restarted due to insufficient resources, DRS rebalance triggered
Exchange Server VM crashed — VMware HA restarting on esxi-01, VM boot delayed due to resource contention, Exchange database dirty shutdown recovery in progress, mail flow interrupted
SQL Server VM unavailable after ESXi PSOD — VMware HA restart pending, insufficient memory on target host, DRS migrating lower-priority VMs to free resources, database recovery pending

Neural Engine Root Cause Analysis

VMware ESXi host esxi-03 has experienced a Purple Screen of Death (PSOD) with a Page Fault Exception 14 in the hostd world, specifically in the vmk_IscsiTransportSendPdu function. This indicates a critical kernel-level failure related to iSCSI storage transport operations, likely caused by faulty hardware (memory/storage controller), corrupted drivers, or storage network issues. The host has powered off completely, affecting 35 VMs and triggering 13 correlated incidents within a 5-minute window.

Remediation Plan

1. Attempt remote power cycle of esxi-03 via iDRAC/iLO management interface. 2. If successful boot, check vmkernel logs for hardware errors and iSCSI configuration issues. 3. Run hardware diagnostics (memory test, storage controller check). 4. Verify iSCSI network connectivity and multipathing configuration. 5. If hardware failure suspected, migrate VMs to other hosts and replace faulty components. 6. If repeated PSOD occurs, engage VMware support with vmkernel dump analysis.
Tested: 2026-03-30Monitors: 4 | Incidents: 4Test ID: cmncjgt5j01caobqeqxscuwwb