PASSEDcloud / docker_daemon_unresponsive

Docker Daemon Unresponsive — Container Operations Frozen

The Docker daemon on a production host becomes unresponsive due to a deadlock in the containerd shim layer. All container operations (start, stop, exec, logs) hang indefinitely. Running containers continue to operate but cannot be managed. New deployments and health checks fail.

Pattern

CONTAINER_EVENT

Severity

CRITICAL

Confidence

85%

Remediation

Remote Hands

Test Results

Metric	Expected	Actual
Pattern Recognition	CONTAINER_EVENT	CONTAINER_EVENT
Severity Assessment	CRITICAL	CRITICAL
Incident Correlation	Yes	21 linked
Cascade Escalation	N/A	No
Remediation	—	Remote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

Docker Engine 24.0 on Ubuntu 22.04. containerd 1.7 shim deadlock. 28 running containers on the host. Docker API returning no responses. Kubernetes kubelet marking node as NotReady due to container runtime not responding. 3 critical services running on this host.

Injected Error Messages (2)

Docker daemon unresponsive — docker.sock not responding to API calls, 'docker ps' command hanging indefinitely, containerd shim process in D state (uninterruptible sleep), 28 running containers orphaned from management plane, container runtime interface (CRI) returning deadline exceeded on all operations, Docker Engine PID 1847 consuming 100% of single core in futex spin lock

Kubernetes node k8s-worker-05 NotReady — kubelet reporting container runtime is down: 'rpc error: code = DeadlineExceeded desc = context deadline exceeded', node taint applied: node.kubernetes.io/not-ready:NoSchedule, existing pods running but unmanageable, pod eviction timer started (5 minute tolerance), 28 pods at risk of rescheduling to other nodes

Neural Engine Root Cause Analysis

The Docker daemon on docker-prod-05 is experiencing a critical deadlock condition, evidenced by PID 1847 consuming 100% CPU in a futex spin lock and containerd shim processes stuck in uninterruptible sleep (D state). This suggests either a kernel-level I/O deadlock affecting the container runtime or severe resource contention preventing normal daemon operations. The docker.sock API unresponsiveness and hanging 'docker ps' commands indicate the daemon's control plane is completely frozen, orphaning 28 running containers from management.

Remediation Plan

1. Attempt graceful Docker daemon restart via systemctl stop/start docker 2. If graceful restart fails, perform forced Docker daemon kill (kill -9) followed by cleanup of stale containerd processes 3. Check system resources (disk space, memory, file descriptors) and kernel logs for I/O errors 4. Restart Docker daemon and verify container recovery 5. If containers don't recover automatically, assess which critical services need manual restart 6. Monitor for recurrence and investigate underlying storage/kernel issues

Tested: 2026-03-30Monitors: 2 | Incidents: 2Test ID: cmncjvdlo0508obqel1ov1lpu