PASSEDcloud / k8s_pv_claim_stuck_pending

Kubernetes PersistentVolumeClaim Stuck Pending

Multiple PersistentVolumeClaims in a Kubernetes cluster are stuck in Pending state after the cloud provider's storage provisioner hits its volume limit. New StatefulSet pods cannot start because they require persistent storage. The storage class provisioner logs show quota exceeded errors.

Pattern

CONTAINER_EVENT

Severity

CRITICAL

Confidence

95%

Remediation

Remote Hands

Test Results

Metric	Expected	Actual
Pattern Recognition	CONTAINER_EVENT	CONTAINER_EVENT
Severity Assessment	CRITICAL	CRITICAL
Incident Correlation	Yes	21 linked
Cascade Escalation	N/A	No
Remediation	—	Remote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

Kubernetes 1.29 on cloud provider. StorageClass 'fast-ssd' with dynamic provisioning. Cloud account volume limit: reached maximum. 8 new PVCs pending. StatefulSet 'elasticsearch-data' scaling from 5 to 8 nodes blocked. CSI driver returning quota errors.

Injected Error Messages (2)

Kubernetes PVC stuck in Pending — 8 PersistentVolumeClaims in Pending state for 45 minutes, StorageClass 'fast-ssd' provisioner failing with 'volume limit exceeded: account has reached maximum number of volumes', CSI driver csi-provisioner reporting ProvisioningFailed events, kubelet unable to mount volumes for new pods, StatefulSet 'elasticsearch-data' replicas: 5/8 (3 pods in Pending state waiting for PVC)

Elasticsearch cluster cannot scale — 3 new data nodes stuck in Pending due to PVC provisioning failure, cluster health: YELLOW (unassigned shards: 147), indexing throughput degraded by 40%, search latency increased to 3.2 seconds (baseline: 400ms), kubelet logs: 'pod has unbound PersistentVolumeClaims', shard rebalancing blocked until new nodes join

Neural Engine Root Cause Analysis

The Kubernetes cluster has reached the maximum volume limit allowed by the underlying cloud provider or storage backend for the 'fast-ssd' StorageClass. The CSI provisioner cannot create new persistent volumes because the account/subscription has exceeded its volume quota, causing 8 PVCs to remain in Pending state. This is preventing StatefulSet pods from starting as they cannot mount required storage, creating a cascading failure affecting multiple workloads including the Elasticsearch cluster.

Remediation Plan

1. Immediately check current volume usage against provider limits via cloud console/CLI 2. Identify and delete unused/orphaned volumes to free up quota 3. Request volume limit increase from cloud provider if legitimate need exists 4. Consider implementing volume cleanup automation and monitoring for quota thresholds 5. Verify PVCs transition to Bound state after quota is freed 6. Monitor StatefulSet rollout completion

Tested: 2026-03-30Monitors: 2 | Incidents: 2Test ID: cmncjvch90507obqedxxfe381