PASSEDvendor / esxi_psod

ESXi Host Purple Screen of Death

An ESXi 8.0 host experiences a Purple Screen of Death (PSOD) due to a faulty network driver. All 35 VMs on the host crash simultaneously. HA attempts to restart them on surviving hosts but resource contention causes slow recovery.

Pattern

VMWARE_EVENT

Severity

CRITICAL

Confidence

92%

Remediation

Remote Hands

Test Results

Metric	Expected	Actual
Pattern Recognition	VMWARE_EVENT	VMWARE_EVENT
Severity Assessment	CRITICAL	CRITICAL
Incident Correlation	Yes	37 linked
Cascade Escalation	Yes	Yes
Remediation	—	Remote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

VMware vSphere 8.0 cluster with 4 ESXi hosts. Host esxi-03 PSOD. 35 VMs affected. HA enabled with 'Restart VMs' policy. Remaining 3 hosts at 75% memory utilization. DRS enabled in fully automated mode.

Injected Error Messages (4)

VMware ESXi host esxi-03 PSOD — Purple Screen of Death, host unreachable, PF Exception 14 in world hostd, backtrace in vmkernel log: #PF at vmk_IscsiTransportSendPdu+0x1a2, host powered off, 35 VMs affected

vCenter HA event: host esxi-03 failure detected — VMware HA initiating VM restart on surviving hosts, admission control: 3 VMs cannot be restarted due to insufficient resources, DRS rebalance triggered

Exchange Server VM crashed — VMware HA restarting on esxi-01, VM boot delayed due to resource contention, Exchange database dirty shutdown recovery in progress, mail flow interrupted

SQL Server VM unavailable after ESXi PSOD — VMware HA restart pending, insufficient memory on target host, DRS migrating lower-priority VMs to free resources, database recovery pending

Neural Engine Root Cause Analysis

VMware ESXi host esxi-03 has experienced a Purple Screen of Death (PSOD) with a Page Fault Exception 14 in the hostd world, specifically in the vmk_IscsiTransportSendPdu function. This indicates a critical kernel-level failure related to iSCSI storage transport operations, likely caused by faulty hardware (memory/storage controller), corrupted drivers, or storage network issues. The host has powered off completely, affecting 35 VMs and triggering 13 correlated incidents within a 5-minute window.

Remediation Plan

1. Attempt remote power cycle of esxi-03 via iDRAC/iLO management interface. 2. If successful boot, check vmkernel logs for hardware errors and iSCSI configuration issues. 3. Run hardware diagnostics (memory test, storage controller check). 4. Verify iSCSI network connectivity and multipathing configuration. 5. If hardware failure suspected, migrate VMs to other hosts and replace faulty components. 6. If repeated PSOD occurs, engage VMware support with vmkernel dump analysis.

Tested: 2026-03-30Monitors: 4 | Incidents: 4Test ID: cmncjgt5j01caobqeqxscuwwb