Nginx as a load balancer is timing out on all upstream connections. Backend servers are up but responding too slowly due to a traffic spike.
Pattern
LOAD_BALANCER_EVENT
Expected: UPSTREAM_TIMEOUT
Severity
HIGH
Confidence
68%
Remediation
Auto-Heal
Test Results
Metric
Expected
Actual
Result
Pattern Recognition
UPSTREAM_TIMEOUT
LOAD_BALANCER_EVENT
Severity Assessment
CRITICAL
HIGH
Incident Correlation
N/A
None
Cascade Escalation
N/A
No
Remediation
—
Auto-Heal — Corax resolves autonomously
Scenario Conditions
Nginx reverse proxy/LB. 3 upstream servers. proxy_read_timeout=30s. Backend avg response: 45s. 95% of requests timing out. Traffic 3x normal from marketing campaign.
Injected Error Messages (1)
Nginx upstream timeout — 95% of requests to backend pool timing out, proxy_read_timeout 30s exceeded, backend avg response 45s, traffic 3x normal from campaign, 502/504 error rate 95%
Neural Engine Root Cause Analysis
Load balancer event detected — one or more backend servers have failed health checks, a pool member is marked down, or upstream connections are timing out. When backends are unhealthy, the load balancer will stop sending traffic to them, potentially overloading remaining healthy servers or causing a complete service outage if all backends are down.
Remediation Plan
1. Check the load balancer dashboard for backend health status and identify which servers are failing health checks.
2. Verify the health check endpoint is responding correctly on the backend servers (check port, path, and expected response).
3. For upstream timeouts, check backend server resource utilization (CPU, memory, connections) and application logs.
4. If all backends are down, investigate the common dependency (database, shared storage, network) rather than individual servers.
5. Temporarily adjust health check thresholds or intervals if backends are flapping due to brief slowdowns.
Improvements Applied
Pattern classified as LOAD_BALANCER_EVENT (expected UPSTREAM_TIMEOUT)