Back to All Scenarios
PASSEDcloud / gcp_lb_health_check_failure

GCP Load Balancer Health Check Failure — All Backends Unhealthy

A GCP HTTP(S) load balancer marks all backend instances as unhealthy after a firewall rule change blocks the health check probe source IP range (35.191.0.0/16). The load balancer returns HTTP errors to all clients despite all backend VMs being fully operational.

Pattern
FIREWALL_RULE_BLOCK
Severity
CRITICAL
Confidence
92%
Remediation
Auto-Heal

Test Results

MetricExpectedActualResult
Pattern RecognitionFIREWALL_RULE_BLOCKFIREWALL_RULE_BLOCK
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes18 linked
Cascade EscalationN/ANo
RemediationAuto-Heal — Corax resolves autonomously

Scenario Conditions

GCP HTTP(S) LB with 6 backend VMs in managed instance group. VPC firewall rule updated, accidentally removing health check source ranges. All backends marked unhealthy. LB returning errors globally.

Injected Error Messages (2)

health check failed on all 6 backend instances — load balancer 'prod-https-lb' reporting 0 healthy backends in instance group 'prod-ig-us-central1', all instances failing health check probes, health check 'hc-prod-8080' receiving no responses from backends, firewall rule 'allow-health-check' was modified 15 minutes ago removing source range 35.191.0.0/16, pool members all marked down, load balancer serving error pages to all clients
health check failure cascade — backend VM 10.128.0.50 is running and serving requests locally but health check probes are being blocked by VPC firewall, all 6 backends in identical state, health check source IP range (35.191.0.0/16 and 130.211.0.0/22) not permitted in current firewall rules, pool member down on all backends, backends are healthy but load balancer cannot verify

Neural Engine Root Cause Analysis

The firewall rule 'allow-health-check' was modified 15 minutes ago, removing the critical source range 35.191.0.0/16 which contains Google Cloud's health check IP addresses. This modification is blocking GCP load balancer health check probes from reaching the backend instances on port 8080, causing all 6 instances to be marked as unhealthy and the load balancer to serve error pages. The timing correlation between the firewall rule change and the incident onset, combined with the FIREWALL_RULE_BLOCK pattern detection at 80% confidence, strongly indicates this configuration change as the root cause.

Remediation Plan

1. Immediately restore the firewall rule 'allow-health-check' to include source range 35.191.0.0/16 for port 8080 access to backend instances. 2. Verify the firewall rule modification was applied successfully via GCP Console or gcloud CLI. 3. Monitor health check status for 'hc-prod-8080' to confirm probes are reaching backends. 4. Wait 2-3 minutes for health checks to pass and instances to be marked healthy in instance group 'prod-ig-us-central1'. 5. Validate load balancer 'prod-https-lb' reports healthy backends and https://app.company.com is accessible. 6. Review change management process to prevent unauthorized firewall modifications.
Tested: 2026-03-30Monitors: 2 | Incidents: 2Test ID: cmnckcrfd08irobqe4s5h3tcx