The private Docker registry (Harbor) becomes unreachable due to a TLS certificate renewal failure. All Kubernetes pods that need to pull or repull images fail with ImagePullBackOff. Existing running containers are fine but no new deployments or restarts work.
Pattern
CONTAINER_EVENT
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands
Test Results
Metric
Expected
Actual
Result
Pattern Recognition
CONTAINER_EVENT
CONTAINER_EVENT
Severity Assessment
CRITICAL
CRITICAL
Incident Correlation
Yes
30 linked
Cascade Escalation
Yes
Yes
Remediation
—
Remote Hands — Corax contacts on-site support via call, email, or API
Scenario Conditions
Harbor private registry at registry.corp.local:5000. Self-signed TLS cert expired. Kubernetes cluster with 12 nodes. CRI-O container runtime. ImagePullPolicy: Always on critical deployments. 3 deployments need rolling update.
Injected Error Messages (3)
Docker registry unreachable — Harbor registry at registry.corp.local:5000 returning TLS handshake failure, certificate expired 2 hours ago, x509: certificate has expired, all image pull operations failing across Kubernetes cluster
Kubernetes ImagePullBackOff on frontend-prod — failed to pull image registry.corp.local:5000/frontend:v2.4.1, x509: certificate has expired or is not yet valid, pod stuck in ImagePullBackOff, rolling update halted at 1/3 replicas
Kubernetes pod restart failing on api-prod — CRI-O unable to pull image from private registry, ErrImagePull: x509 certificate signed by unknown authority, existing pods running but cannot be rescheduled if evicted, deployment rollout stuck
Neural Engine Root Cause Analysis
The Harbor Docker registry at registry.corp.local:5000 is experiencing a TLS handshake failure due to an expired SSL/TLS certificate that expired 2 hours ago. This certificate expiration is causing all Docker image pull operations to fail across the Kubernetes cluster, resulting in a cascade of 12 correlated incidents. The x509 certificate validation error is preventing secure connections to the registry, making it completely unreachable for container operations.
Remediation Plan
1. Immediately renew or replace the expired SSL/TLS certificate for registry.corp.local:5000. 2. If using Let's Encrypt or automated certificate management, check why auto-renewal failed and fix the underlying issue. 3. Restart the Harbor registry service to load the new certificate. 4. Verify connectivity by testing the /v2/_catalog endpoint. 5. Monitor Kubernetes cluster for successful image pulls and resolution of cascaded incidents. 6. Implement certificate expiration monitoring with alerts 30+ days before expiration to prevent future occurrences.