Back to All Scenarios
PASSEDserver / docker_registry_unreachable

Docker Registry Unreachable — No Image Pulls

The private Docker registry (Harbor) becomes unreachable due to a TLS certificate renewal failure. All Kubernetes pods that need to pull or repull images fail with ImagePullBackOff. Existing running containers are fine but no new deployments or restarts work.

Pattern
CONTAINER_EVENT
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands

Test Results

MetricExpectedActualResult
Pattern RecognitionCONTAINER_EVENTCONTAINER_EVENT
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes30 linked
Cascade EscalationYesYes
RemediationRemote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

Harbor private registry at registry.corp.local:5000. Self-signed TLS cert expired. Kubernetes cluster with 12 nodes. CRI-O container runtime. ImagePullPolicy: Always on critical deployments. 3 deployments need rolling update.

Injected Error Messages (3)

Docker registry unreachable — Harbor registry at registry.corp.local:5000 returning TLS handshake failure, certificate expired 2 hours ago, x509: certificate has expired, all image pull operations failing across Kubernetes cluster
Kubernetes ImagePullBackOff on frontend-prod — failed to pull image registry.corp.local:5000/frontend:v2.4.1, x509: certificate has expired or is not yet valid, pod stuck in ImagePullBackOff, rolling update halted at 1/3 replicas
Kubernetes pod restart failing on api-prod — CRI-O unable to pull image from private registry, ErrImagePull: x509 certificate signed by unknown authority, existing pods running but cannot be rescheduled if evicted, deployment rollout stuck

Neural Engine Root Cause Analysis

The Harbor Docker registry at registry.corp.local:5000 is experiencing a TLS handshake failure due to an expired SSL/TLS certificate that expired 2 hours ago. This certificate expiration is causing all Docker image pull operations to fail across the Kubernetes cluster, resulting in a cascade of 12 correlated incidents. The x509 certificate validation error is preventing secure connections to the registry, making it completely unreachable for container operations.

Remediation Plan

1. Immediately renew or replace the expired SSL/TLS certificate for registry.corp.local:5000. 2. If using Let's Encrypt or automated certificate management, check why auto-renewal failed and fix the underlying issue. 3. Restart the Harbor registry service to load the new certificate. 4. Verify connectivity by testing the /v2/_catalog endpoint. 5. Monitor Kubernetes cluster for successful image pulls and resolution of cascaded incidents. 6. Implement certificate expiration monitoring with alerts 30+ days before expiration to prevent future occurrences.
Tested: 2026-03-30Monitors: 3 | Incidents: 3Test ID: cmncjmlnj02p8obqeylbq41y6