The Elasticsearch cluster backing the ELK logging stack has run out of disk space. Elasticsearch has entered read-only mode, Logstash is backing up, and no new logs are being indexed. Security event logs, application logs, and audit trails are all being dropped.
Pattern
DISK_FULL
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands
Test Results
Metric
Expected
Actual
Result
Pattern Recognition
DISK_FULL
DISK_FULL
Severity Assessment
CRITICAL
CRITICAL
Incident Correlation
Yes
22 linked
Cascade Escalation
N/A
No
Remediation
—
Remote Hands — Corax contacts on-site support via call, email, or API
Scenario Conditions
3-node Elasticsearch cluster. All nodes at 95%+ disk. Cluster in read-only mode. Logstash pipeline backing up. Filebeat agents on 200 servers buffering locally. 15GB/hour log volume being dropped.
Injected Error Messages (2)
Elasticsearch cluster RED — all 3 data nodes at 96% disk utilization, cluster entered read-only mode (index.blocks.read_only_allow_delete=true), no new documents being indexed, disk full on all data paths, 47 indices in RED state, disk space critical across all nodes: node-1: 1.8TB/1.9TB, node-2: 1.85TB/1.9TB, node-3: 1.87TB/1.9TB, no space left for new index creation, filesystem full preventing log ingestion
Logstash pipeline backed up — 2.3 million events in persistent queue (queue at 95% capacity), output to Elasticsearch failing with 'ClusterBlockException: index read-only', events being dropped at rate of 15GB/hour, Filebeat agents on 200 servers buffering locally (filling up on source servers), security event logs not being ingested, compliance audit trail gap growing, disk full on Elasticsearch preventing any log processing
Neural Engine Root Cause Analysis
The Elasticsearch cluster has reached critical disk capacity (96% utilization across all 3 data nodes) causing the cluster to automatically enter read-only mode as a protective measure. With only ~100GB free space remaining per node out of 1.9TB total, the cluster cannot perform normal operations like indexing new documents or creating indices. The 47 indices in RED state and 11 correlated incidents indicate this disk exhaustion is causing cascading failures across dependent services that rely on log ingestion and search capabilities.
Remediation Plan
1. Immediately delete old indices or snapshots to reclaim disk space (target <80% utilization). 2. Remove the read-only block by setting 'index.blocks.read_only_allow_delete=false' via Elasticsearch API. 3. Monitor cluster status until it returns to GREEN state. 4. Implement data retention policies to prevent recurrence. 5. Consider adding storage capacity or additional nodes for long-term scaling. 6. Set up proactive disk usage alerting at 85% threshold.