Back to All Scenarios
PASSEDcloud / gcp_gke

GCP: GKE Node Auto-Repair Triggered — Node Unresponsive

GKE detected an unhealthy node and triggered auto-repair, which involves draining and recreating the node. Disruption to pods during repair.

Pattern
UNKNOWN
Expected: GCP_GKE_REPAIR
Severity
MEDIUM
Confidence
68%
Remediation
Auto-Heal

Test Results

MetricExpectedActualResult
Pattern RecognitionGCP_GKE_REPAIRUNKNOWN
Severity AssessmentHIGHMEDIUM
Incident CorrelationN/ANone
Cascade EscalationN/ANo
RemediationAuto-Heal — Corax resolves autonomously

Scenario Conditions

GCP GKE cluster 'prod'. Node 'gke-prod-pool-abc123' unresponsive for 10 minutes. Auto-repair triggered. 20 pods being evicted. PodDisruptionBudgets may be violated.

Injected Error Messages (1)

GKE auto-repair — node 'gke-prod-pool-abc123' marked unhealthy, auto-repair draining node, 20 pods being evicted, PDB violations possible, temporary capacity reduction

Neural Engine Root Cause Analysis

Unrecognized error pattern — this may be a new type of failure not yet cataloged.

Remediation Plan

Trigger autonomous healing to diagnose and fix. Manual investigation may be needed if healing cannot resolve.

Improvements Applied

  • Pattern unrecognized — consider adding keywords for GCP_GKE_REPAIR
  • Severity: MEDIUM (expected HIGH)
Tested: 2026-04-02Monitors: 1 | Incidents: 1Test ID: cmnhnr8vc09brlig7wka26zce