Back to All Scenarios
PASSEDserver / disk_full

Disk Full — Log Files Filling Production Database Server

The database server's data partition fills to 100% due to an unrotated slow query log. PostgreSQL can no longer write WAL files. All database writes fail. Application is read-only.

Pattern
DISK_FULL
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionDISK_FULLDISK_FULL
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes29 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

PostgreSQL 15 on Ubuntu 22.04. 500GB data partition. Slow query log grew to 180GB overnight. No log rotation configured. 3 applications depend on this database.

Injected Error Messages (3)

disk full on /data partition — no space left on device, PostgreSQL PANIC: could not write WAL file, disk space 100% used (500GB/500GB), filesystem full
Database connection errors — PostgreSQL rejecting writes due to disk full, application returning 500 errors on all write operations
Application write failures cascading from database disk full — no space left on device propagating through connection pool

Neural Engine Root Cause Analysis

The PostgreSQL database server has crashed due to complete disk space exhaustion on the /data partition (500GB/500GB used). PostgreSQL requires free disk space to write Write-Ahead Log (WAL) files for transaction durability, and when this fails, the database enters a PANIC state and shuts down to prevent data corruption. The 12 correlated incidents indicate this disk full condition is likely impacting multiple services that depend on this database or share the same underlying storage infrastructure.

Remediation Plan

1. Immediately check disk usage and identify largest files/directories consuming space on /data partition. 2. Archive or compress old PostgreSQL WAL files and logs if safe to do so. 3. Check for any large temporary files, core dumps, or application logs that can be cleaned up. 4. If possible, move non-critical data to alternate storage or expand the /data partition. 5. Once sufficient space is freed (recommend minimum 10-15% free space), restart PostgreSQL service. 6. Verify database integrity and that all dependent services recover. 7. Implement disk space monitoring and log rotation policies to prevent recurrence.
Tested: 2026-03-30Monitors: 3 | Incidents: 3Test ID: cmncjemmj00qvobqe6op3qalu