PASSEDcloud / k8s_network_policy_blocking

K8s Network Policy Blocking Inter-Pod Communication

A newly applied Kubernetes NetworkPolicy with an overly restrictive ingress rule blocks all traffic between the API pods and the database pods. The policy was intended to restrict external access but inadvertently blocks intra-cluster communication, causing a complete application outage.

Pattern

CONTAINER_EVENT

Severity

CRITICAL

Confidence

95%

Remediation

Auto-Heal

Test Results

Metric	Expected	Actual
Pattern Recognition	CONTAINER_EVENT	CONTAINER_EVENT
Severity Assessment	CRITICAL	CRITICAL
Incident Correlation	Yes	22 linked
Cascade Escalation	N/A	No
Remediation	—	Auto-Heal — Corax resolves autonomously

Scenario Conditions

K8s namespace with NetworkPolicy enabled (Calico CNI). New policy applied with empty ingress rule (deny all). API pods cannot reach database pods. Application completely non-functional. Policy applied to 'backend' namespace affecting 15 pods.

Injected Error Messages (2)

kubelet pod 'backend-api-6f8d9c7b-wq4z' in namespace 'backend' reporting connection failures — all outbound connections to database pods being dropped, Calico network policy 'restrict-backend-ingress' applied 10 minutes ago blocking all intra-namespace traffic, pod notready due to failed liveness probe (database dependency check), all 8 API replicas affected, container restart initiated but connectivity still blocked by network policy

database pod 'postgres-primary-0' receiving zero connections — expected 8 connections from API pods, kubelet reporting pod healthy but no inbound traffic, Calico network policy dropping all ingress from API pod CIDR, connection pool on API side reporting failures to all database endpoints within cluster, pod eviction risk increasing

Neural Engine Root Cause Analysis

The root cause is a misconfigured Calico network policy 'restrict-backend-ingress' that was applied 10 minutes ago and is blocking all intra-namespace traffic in the 'backend' namespace. This policy is preventing the backend-api pods from connecting to database pods, causing liveness probe failures and making all 8 API replicas unavailable. The timing correlation between the network policy application and the incident onset confirms this is a network policy misconfiguration rather than an application or infrastructure issue.

Remediation Plan

1. Immediately review and modify the Calico network policy 'restrict-backend-ingress' to allow necessary intra-namespace communication between backend-api and database pods. 2. If the policy was meant to restrict external ingress only, update the policy to allow internal pod-to-pod communication while maintaining security boundaries. 3. Wait for pod readiness probes to pass as database connectivity is restored. 4. Verify all 8 API replicas return to healthy state. 5. Test end-to-end API functionality to confirm full service restoration.

Tested: 2026-03-30Monitors: 2 | Incidents: 2Test ID: cmnckdnob08qjobqe9jgdcjps