Back to All Scenarios
PASSEDserver / container_oomkilled

Container OOMKilled Repeatedly

A Java-based microservice container is being repeatedly OOMKilled because the JVM heap (-Xmx) is set to 512MB but the container memory limit is also 512MB, leaving no room for JVM metaspace, thread stacks, and native memory. The pod restarts every 3-5 minutes.

Pattern
CONTAINER_EVENT
Severity
CRITICAL
Confidence
95%
Remediation
Auto-Heal

Test Results

MetricExpectedActualResult
Pattern RecognitionCONTAINER_EVENTCONTAINER_EVENT
Severity AssessmentCRITICALCRITICAL
Incident CorrelationYes9 linked
Cascade EscalationN/ANo
RemediationAuto-Heal — Corax resolves autonomously

Scenario Conditions

Kubernetes pod with Java 17 container. Memory limit: 512Mi. JVM -Xmx512m (matches limit exactly). Container OOMKilled 47 times in 4 hours. CrashLoopBackOff with increasing backoff. ReplicaSet has 1 replica.

Injected Error Messages (1)

Container OOMKilled repeatedly — pod 'order-service-7b9f4c6d8-x2k4j' killed by OOM (exit code 137), reason: OOMKilled, 47 restarts in 4 hours, container memory limit: 512Mi, JVM heap: 512MB leaving 0 for metaspace/threads, kernel oom_score_adj: 1000, CrashLoopBackOff backoff: 5m0s

Neural Engine Root Cause Analysis

The order-service pod is experiencing Out-of-Memory (OOM) kills due to inadequate memory configuration. The container has a 512Mi memory limit with JVM heap set to 512MB, leaving zero memory for metaspace, thread stacks, and other JVM overhead, causing immediate memory exhaustion. The 47 restarts in 4 hours and CrashLoopBackOff indicate a persistent configuration issue rather than a transient memory leak.

Remediation Plan

1. Immediately increase container memory limit to at least 768Mi-1Gi to accommodate JVM overhead. 2. Adjust JVM heap size to 70% of container limit (e.g., 512MB heap for 768Mi container). 3. Add JVM flags for better memory management: -XX:+UseContainerSupport -XX:MaxRAMPercentage=70. 4. Monitor memory usage post-deployment and adjust limits based on actual consumption patterns. 5. Implement memory monitoring alerts to prevent future OOM conditions.
Tested: 2026-03-30Monitors: 1 | Incidents: 1Test ID: cmncjn07o02s8obqeclruvg3p