Back to All Scenarios
PASSEDdatabase / postgres_vacuum_bloat

PostgreSQL Vacuum Bloat — Transaction ID Wraparound Warning

PostgreSQL autovacuum has been unable to keep up with a high-write workload, and the database is approaching transaction ID wraparound. The autovacuum_freeze_max_age threshold is reached, forcing aggressive anti-wraparound vacuums that consume all I/O. Database performance degrades severely as aggressive vacuum competes with production queries.

Pattern
DATABASE_EVENT
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionDATABASE_EVENTDATABASE_EVENT
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes20 linked
Cascade EscalationN/ANo
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

PostgreSQL 16. Table 'events' with 2 billion rows, 800GB. Autovacuum lagging behind writes. Transaction age: 1.95 billion (wraparound at 2 billion). Anti-wraparound vacuum running aggressively. I/O utilization: 98%. 150 concurrent connections.

Injected Error Messages (2)

PostgreSQL transaction ID wraparound imminent — WARNING: database 'production' must be vacuumed within 50,000,000 transactions, current age: 1,950,000,000, anti-wraparound autovacuum running on table 'events' (800GB, 2B rows), I/O wait: 85%, autovacuum workers: 5/5 busy, replication lag spiking to 120 seconds due to I/O contention, slow query count tripled in last hour, database connection pool under heavy pressure
API response times critically degraded — p99 latency: 28 seconds (baseline: 200ms), PostgreSQL queries blocked behind anti-wraparound vacuum I/O, connection acquisition from pool taking 15 seconds, upstream gateway returning slow query responses, 30% of API requests failing with read/write contention, max connections nearly exhausted

Neural Engine Root Cause Analysis

The PostgreSQL database is experiencing transaction ID wraparound, a critical condition where the database approaches the 2-billion transaction limit and triggers emergency autovacuum operations. The anti-wraparound autovacuum is consuming all available I/O resources (85% I/O wait) while processing the massive 'events' table (800GB, 2B rows), causing cascading performance degradation including replication lag, slow queries, and connection pool exhaustion. This is fundamentally a maintenance issue where regular vacuum operations were insufficient to prevent transaction ID buildup.

Remediation Plan

1. Immediately assess if manual VACUUM FREEZE can be performed on smaller tables to buy time. 2. Consider temporarily pausing non-critical write operations to reduce transaction pressure. 3. Monitor autovacuum progress on 'events' table and allow it to complete (may take hours). 4. Scale read operations to standby replicas if possible to reduce primary load. 5. Once crisis passes, implement proper autovacuum tuning (increase workers, adjust thresholds) and consider table partitioning for the events table. 6. Establish monitoring for transaction age to prevent recurrence.
Tested: 2026-03-30Monitors: 2 | Incidents: 2Test ID: cmncjttjn04miobqefqhyr2vq