PASSEDvendor / nutanix_cvm_down

Nutanix CVM Crash — Node Storage Offline

A Nutanix Controller VM (CVM) crashes on one node of a 4-node cluster. All VMs on that node lose local storage access. The cluster attempts to serve I/O from surviving CVMs but performance degrades significantly.

Pattern

NUTANIX_EVENT

Severity

CRITICAL

Confidence

85%

Remediation

Remote Hands

Test Results

Metric	Expected	Actual
Pattern Recognition	NUTANIX_EVENT	NUTANIX_EVENT
Severity Assessment	CRITICAL	CRITICAL
Incident Correlation	Yes	36 linked
Cascade Escalation	Yes	Yes
Remediation	—	Remote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

4-node Nutanix cluster (NX-3460). RF2 storage. 60 VMs total, 15 on affected node. CVM crashed due to Stargate process segfault. Prism Central monitoring active.

Injected Error Messages (3)

Nutanix CVM down on node-3 — CVM unreachable at 10.10.50.103, Stargate process crash (segfault), local storage I/O redirected to remote CVMs, Prism alert: CRITICAL

Nutanix cluster health degraded — storage degraded after CVM failure on node-3, Prism Central alert: node storage offline, curator scan initiated for data rebalance

SQL Server experiencing high I/O latency after Nutanix CVM down on its host node — storage latency 45ms (baseline 2ms), query timeouts increasing

Neural Engine Root Cause Analysis

The Nutanix Controller VM (CVM) on node-3 has crashed due to a segmentation fault in the Stargate process, which is the core distributed storage service. This segfault has rendered the entire CVM unreachable at 10.10.50.103:9440, causing local storage I/O to be redirected to remote CVMs and triggering a critical Prism alert. The presence of 14 correlated incidents suggests this may be part of a broader cluster instability or cascading failure affecting multiple nodes.

Remediation Plan

1. Immediately check cluster health and remaining node capacity to ensure service continuity. 2. Access the hypervisor hosting node-3 and attempt to restart the CVM via vCenter/AHV. 3. Monitor CVM boot process and check /home/nutanix/data/logs for segfault details in stargate.log. 4. If restart fails, escalate to Nutanix support with crash dumps and consider emergency procedures to maintain cluster quorum. 5. Investigate root cause of segfault (memory corruption, hardware issues, or software bugs) to prevent recurrence.

Tested: 2026-03-30Monitors: 3 | Incidents: 3Test ID: cmncjdq6h00k5obqel0nz0cz9