Back to All Scenarios
PASSEDvendor / nutanix_cvm_down

Nutanix CVM Crash — Node Storage Offline

A Nutanix Controller VM (CVM) crashes on one node of a 4-node cluster. All VMs on that node lose local storage access. The cluster attempts to serve I/O from surviving CVMs but performance degrades significantly.

Pattern
NUTANIX_EVENT
Severity
CRITICAL
Confidence
85%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionNUTANIX_EVENTNUTANIX_EVENT
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes36 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

4-node Nutanix cluster (NX-3460). RF2 storage. 60 VMs total, 15 on affected node. CVM crashed due to Stargate process segfault. Prism Central monitoring active.

Injected Error Messages (3)

Nutanix CVM down on node-3 — CVM unreachable at 10.10.50.103, Stargate process crash (segfault), local storage I/O redirected to remote CVMs, Prism alert: CRITICAL
Nutanix cluster health degraded — storage degraded after CVM failure on node-3, Prism Central alert: node storage offline, curator scan initiated for data rebalance
SQL Server experiencing high I/O latency after Nutanix CVM down on its host node — storage latency 45ms (baseline 2ms), query timeouts increasing

Neural Engine Root Cause Analysis

The Nutanix Controller VM (CVM) on node-3 has crashed due to a segmentation fault in the Stargate process, which is the core distributed storage service. This segfault has rendered the entire CVM unreachable at 10.10.50.103:9440, causing local storage I/O to be redirected to remote CVMs and triggering a critical Prism alert. The presence of 14 correlated incidents suggests this may be part of a broader cluster instability or cascading failure affecting multiple nodes.

Remediation Plan

1. Immediately check cluster health and remaining node capacity to ensure service continuity. 2. Access the hypervisor hosting node-3 and attempt to restart the CVM via vCenter/AHV. 3. Monitor CVM boot process and check /home/nutanix/data/logs for segfault details in stargate.log. 4. If restart fails, escalate to Nutanix support with crash dumps and consider emergency procedures to maintain cluster quorum. 5. Investigate root cause of segfault (memory corruption, hardware issues, or software bugs) to prevent recurrence.
Tested: 2026-03-30Monitors: 3 | Incidents: 3Test ID: cmncjdq6h00k5obqel0nz0cz9