PASSEDnetwork / load_balancer

Nginx Load Balancer Upstream Timeout — Backend Overloaded

Nginx as a load balancer is timing out on all upstream connections. Backend servers are up but responding too slowly due to a traffic spike.

Pattern

LOAD_BALANCER_EVENT

Expected: UPSTREAM_TIMEOUT

Severity

HIGH

Confidence

68%

Remediation

Auto-Heal

Test Results

Metric	Expected	Actual
Pattern Recognition	UPSTREAM_TIMEOUT	LOAD_BALANCER_EVENT
Severity Assessment	CRITICAL	HIGH
Incident Correlation	N/A	None
Cascade Escalation	N/A	No
Remediation	—	Auto-Heal — Corax resolves autonomously

Scenario Conditions

Nginx reverse proxy/LB. 3 upstream servers. proxy_read_timeout=30s. Backend avg response: 45s. 95% of requests timing out. Traffic 3x normal from marketing campaign.

Injected Error Messages (1)

Nginx upstream timeout — 95% of requests to backend pool timing out, proxy_read_timeout 30s exceeded, backend avg response 45s, traffic 3x normal from campaign, 502/504 error rate 95%

Neural Engine Root Cause Analysis

Load balancer event detected — one or more backend servers have failed health checks, a pool member is marked down, or upstream connections are timing out. When backends are unhealthy, the load balancer will stop sending traffic to them, potentially overloading remaining healthy servers or causing a complete service outage if all backends are down.

Remediation Plan

1. Check the load balancer dashboard for backend health status and identify which servers are failing health checks. 2. Verify the health check endpoint is responding correctly on the backend servers (check port, path, and expected response). 3. For upstream timeouts, check backend server resource utilization (CPU, memory, connections) and application logs. 4. If all backends are down, investigate the common dependency (database, shared storage, network) rather than individual servers. 5. Temporarily adjust health check thresholds or intervals if backends are flapping due to brief slowdowns.

Improvements Applied

Pattern classified as LOAD_BALANCER_EVENT (expected UPSTREAM_TIMEOUT)
Severity: HIGH (expected CRITICAL)

Tested: 2026-04-02Monitors: 1 | Incidents: 1Test ID: cmnhnoopv0023lig7emxm1d28