Back to All Scenarios
PASSEDcloud / azure_aks

Azure: AKS Node Pool Scaling Failed — Quota Exceeded

AKS cluster autoscaler cannot add nodes because the Azure subscription hit its vCPU quota limit. Pods pending with no capacity.

Pattern
AZURE_CLOUD
Expected: AZURE_QUOTA_EXCEEDED
Severity
HIGH
Confidence
68%
Remediation
Auto-Heal

Test Results

MetricExpectedActualResult
Pattern RecognitionAZURE_QUOTA_EXCEEDEDAZURE_CLOUD
Severity AssessmentHIGHHIGH
Incident CorrelationN/ANone
Cascade EscalationN/ANo
RemediationAuto-Heal — Corax resolves autonomously

Scenario Conditions

Azure AKS cluster 'prod'. Node pool needs to scale from 5 to 8 nodes (Standard_D4s_v3). Subscription vCPU quota: 80 used / 80 limit. 12 pods pending.

Injected Error Messages (1)

AKS scaling failure — cluster 'prod' autoscaler cannot add nodes, Azure subscription vCPU quota exceeded (80/80), 12 pods in Pending state, no capacity for new workloads

Neural Engine Root Cause Analysis

Azure cloud infrastructure event detected — an Azure resource may be failing, an App Service is unhealthy, Azure AD authentication is disrupted, or a Service Bus queue is backed up. Azure outages can cascade across dependent services and affect both cloud-hosted applications and hybrid on-premises integrations relying on Azure AD.

Remediation Plan

1. Check Azure Service Health (status.azure.com) for any active incidents in your region. 2. Review Azure Monitor alerts and resource health for the affected service. 3. For App Service issues, check the Kudu console for application logs and restart the app if needed. 4. For Azure AD issues, verify conditional access policies and check the Azure AD sign-in logs for failure reasons. 5. For Service Bus, check dead-letter queues and verify the sending/receiving applications are connected and processing messages.

Improvements Applied

  • Pattern classified as AZURE_CLOUD (expected AZURE_QUOTA_EXCEEDED)
Tested: 2026-04-02Monitors: 1 | Incidents: 1Test ID: cmnhnr8vc09bolig7scqsx6j9