Azure DevOps Pipeline Failure — Build Agent Pool Exhausted
All Azure DevOps self-hosted build agents are stuck on hung builds, preventing any new CI/CD pipelines from running. The agent pool shows 0 available agents. Development velocity drops to zero as no code can be built, tested, or deployed.
Pattern
AZURE_CLOUD
Severity
CRITICAL
Confidence
85%
Remediation
Remote Hands
Test Results
Metric
Expected
Actual
Result
Pattern Recognition
AZURE_CLOUD
AZURE_CLOUD
Severity Assessment
CRITICAL
CRITICAL
Incident Correlation
Yes
18 linked
Cascade Escalation
N/A
No
Remediation
—
Remote Hands — Corax contacts on-site support via call, email, or API
Scenario Conditions
15 self-hosted Azure DevOps agents. All agents stuck on builds that hung due to a shared NFS mount becoming unresponsive. Agent timeout set to 6 hours. No auto-recovery configured.
Injected Error Messages (2)
azure devops agent pool 'Production-Agents' has 0 available agents — all 15 agents in BUSY state, oldest running job started 5 hours ago, builds queued: 47, agents appear hung on NFS-dependent build step, azure devops pipeline queue growing, no capacity to run any new builds or deployments
azure devops production deployment pipeline blocked — job queued for 3 hours with no available agents, critical hotfix for production security vulnerability cannot be deployed, all azure pipeline agents unresponsive, development team completely blocked on build and deploy operations
Neural Engine Root Cause Analysis
All 15 Azure DevOps build agents in the Production-Agents pool are hung in BUSY state for 5+ hours, with the oldest job running since then. The agents appear to be stuck on NFS-dependent build steps, indicating a distributed storage connectivity or performance issue that's preventing build jobs from completing. This has created a cascading failure where no new builds can execute, resulting in 47 queued builds and complete CI/CD pipeline paralysis.
Remediation Plan
1. Immediately restart all hung Azure DevOps agents to release them from BUSY state 2. Investigate NFS mount connectivity and performance on agent machines 3. Check NFS server health, disk space, and network connectivity 4. Clear any stale NFS locks or hung processes 5. Verify build artifacts and dependencies are accessible via NFS 6. Monitor agent pool recovery and build queue processing 7. Implement NFS health checks in build pipelines to prevent future hangs