Back to All Scenarios
PASSEDserver / db_connection_pool_exhausted

Database Connection Pool Exhausted

The application's database connection pool (PgBouncer) is exhausted after a slow query causes connections to pile up. New requests queue behind the pool, causing cascading timeouts. The application returns 503 errors while the database itself is healthy.

Pattern
DATABASE_EVENT
Severity
CRITICAL
Confidence
85%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionDATABASE_EVENTDATABASE_EVENT
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes23 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

PgBouncer with max 100 server connections. PostgreSQL 16 (max_connections: 150). Slow query taking 30s instead of 200ms. 100 connections stuck waiting for slow query pattern. Application thread pool also exhausting.

Injected Error Messages (3)

PgBouncer connection pool exhausted — 100/100 server connections active, 247 client connections queued, pool mode: transaction, wait timeout errors increasing, SHOW POOLS: cl_waiting=247 sv_active=100 sv_idle=0, slow query pattern detected on 78 connections
Application server returning 503 — database connection acquisition timeout after 30 seconds, HikariPool-1: connection is not available, request timeout, 500 requests queued, thread pool exhausted (200/200 threads blocked on DB)
Application server health check failing — /health endpoint returning 503 Service Unavailable, database connectivity check: FAILED, connection pool wait exceeds timeout, ALB marking target unhealthy, cascading to all API consumers

Neural Engine Root Cause Analysis

PgBouncer connection pool has reached maximum capacity (100/100 server connections) with 247 client connections queued, indicating the underlying PostgreSQL database is overwhelmed or unresponsive. Slow query patterns detected on 78 connections suggest database performance degradation, potentially due to long-running queries, resource contention, or database locks preventing connection release. The 10 correlated incidents suggest a widespread database or infrastructure issue affecting multiple services simultaneously.

Remediation Plan

1. Immediately check PostgreSQL database health and identify blocking/long-running queries using pg_stat_activity. 2. Kill problematic long-running queries if safe to do so. 3. Restart PgBouncer service to clear connection queue if database is healthy. 4. Monitor connection pool utilization and consider increasing max_client_conn or server_lifetime settings. 5. Investigate root cause of database performance issues (disk I/O, memory, CPU, locks) and address underlying bottlenecks. 6. Review application connection handling patterns to prevent future pool exhaustion.
Tested: 2026-03-30Monitors: 3 | Incidents: 3Test ID: cmncjn2h702saobqe6fwj9muo