Back to All Scenarios
PASSEDnetwork / load_balancer

Nginx Load Balancer Upstream Timeout — Backend Overloaded

Nginx as a load balancer is timing out on all upstream connections. Backend servers are up but responding too slowly due to a traffic spike.

Pattern
LOAD_BALANCER_EVENT
Expected: UPSTREAM_TIMEOUT
Severity
HIGH
Confidence
68%
Remediation
Auto-Heal

Test Results

MetricExpectedActualResult
Pattern RecognitionUPSTREAM_TIMEOUTLOAD_BALANCER_EVENT
Severity AssessmentCRITICALHIGH
Incident CorrelationN/ANone
Cascade EscalationN/ANo
RemediationAuto-Heal — Corax resolves autonomously

Scenario Conditions

Nginx reverse proxy/LB. 3 upstream servers. proxy_read_timeout=30s. Backend avg response: 45s. 95% of requests timing out. Traffic 3x normal from marketing campaign.

Injected Error Messages (1)

Nginx upstream timeout — 95% of requests to backend pool timing out, proxy_read_timeout 30s exceeded, backend avg response 45s, traffic 3x normal from campaign, 502/504 error rate 95%

Neural Engine Root Cause Analysis

Load balancer event detected — one or more backend servers have failed health checks, a pool member is marked down, or upstream connections are timing out. When backends are unhealthy, the load balancer will stop sending traffic to them, potentially overloading remaining healthy servers or causing a complete service outage if all backends are down.

Remediation Plan

1. Check the load balancer dashboard for backend health status and identify which servers are failing health checks. 2. Verify the health check endpoint is responding correctly on the backend servers (check port, path, and expected response). 3. For upstream timeouts, check backend server resource utilization (CPU, memory, connections) and application logs. 4. If all backends are down, investigate the common dependency (database, shared storage, network) rather than individual servers. 5. Temporarily adjust health check thresholds or intervals if backends are flapping due to brief slowdowns.

Improvements Applied

  • Pattern classified as LOAD_BALANCER_EVENT (expected UPSTREAM_TIMEOUT)
  • Severity: HIGH (expected CRITICAL)
Tested: 2026-04-02Monitors: 1 | Incidents: 1Test ID: cmnhnoopv0023lig7emxm1d28