A DRS-triggered vMotion fails mid-migration due to a vMotion network MTU mismatch. The VM enters a stuck state — partially migrated, with the source host holding the memory pages and the destination unable to complete the switchover.
Pattern
VMWARE_EVENT
Severity
CRITICAL
Confidence
92%
Remediation
Remote Hands
Test Results
Metric
Expected
Actual
Result
Pattern Recognition
VMWARE_EVENT
VMWARE_EVENT
Severity Assessment
CRITICAL
CRITICAL
Incident Correlation
Yes
27 linked
Cascade Escalation
N/A
No
Remediation
—
Remote Hands — Corax contacts on-site support via call, email, or API
Scenario Conditions
VMware DRS in fully automated mode. vMotion network on VLAN 50 (jumbo frames 9000). One host's vMotion vmnic misconfigured to MTU 1500. Large VM (128GB RAM) stuck mid-migration.
Injected Error Messages (2)
VMware vMotion failure — DRS-initiated migration of ERP-Prod-01 from esxi-02 to esxi-04 failed at 67%, error: vMotion failed due to network error, failed to receive vmknic packet: TCP timeout, MTU mismatch on vMotion network (9000 vs 1500), VM in undefined migration state
ERP application unresponsive — VM stuck in vMotion migration state, guest OS frozen, memory pages split between source and destination hosts, application timeout on all requests, TCP connections resetting
Neural Engine Root Cause Analysis
The vMotion operation failed due to a critical MTU mismatch between ESXi hosts on the vMotion network (9000 vs 1500 bytes), causing TCP timeouts during VM migration. This network configuration inconsistency prevents proper packet transmission between esxi-02 and esxi-04, leaving the ERP-Prod-01 VM in an undefined migration state. The 10 correlated incidents suggest this MTU mismatch is affecting multiple VMs or operations across the cluster.
Remediation Plan
1. Immediately check and document current MTU settings on all ESXi hosts' vMotion network interfaces. 2. Standardize MTU to 9000 bytes across all hosts (recommended for vMotion performance) or 1500 bytes if jumbo frames aren't supported by network infrastructure. 3. Verify network switch configurations support chosen MTU size end-to-end. 4. Test vMotion connectivity between all hosts using vmkping with large packet sizes. 5. Reset ERP-Prod-01 VM migration state and retry vMotion operation. 6. Monitor DRS operations to ensure cluster stability is restored.