PASSEDserver / linux_kernel_panic_oom

Kernel Panic — OOM Killer Invoked

A Linux production server experiences kernel panic after the OOM killer is invoked repeatedly, killing critical processes including the database and web server. The system becomes unresponsive after the OOM killer targets the init process.

Pattern

MEMORY_EXHAUSTION

Severity

CRITICAL

Confidence

92%

Remediation

Remote Hands

Test Results

Metric	Expected	Actual
Pattern Recognition	MEMORY_EXHAUSTION	MEMORY_EXHAUSTION
Severity Assessment	CRITICAL	CRITICAL
Incident Correlation	Yes	10 linked
Cascade Escalation	N/A	No
Remediation	—	Remote Hands — Corax contacts on-site support via call, email, or API

Scenario Conditions

Ubuntu 22.04 LTS production server. 64GB RAM fully consumed. Java application memory leak consuming 58GB. OOM killer invokes multiple times, killing PostgreSQL, Nginx, then finally systemd. Kernel panic follows.

Injected Error Messages (2)

Linux server prod-web-03 kernel panic after oom kill cascade — oom kill invoked on PostgreSQL (pid 1842, 2.1GB), then nginx (pid 1205, 512MB), then systemd-journald (pid 445), kernel panic: not syncing: attempted to kill init, system unresponsive, memory exhausted by java process pid 3201 consuming 58GB RSS

PostgreSQL terminated by oom kill on prod-web-03 — out of memory: kill process 1842 (postgres) score 89, server memory fully exhausted, all database connections dropped, oom kill freed 2.1GB but system continued to spiral

Neural Engine Root Cause Analysis

The server prod-web-03 experienced a catastrophic memory exhaustion event caused by a Java process (PID 3201) consuming 58GB of RSS memory. This triggered an Out-of-Memory (OOM) killer cascade that terminated critical system processes including PostgreSQL, nginx, and systemd-journald, ultimately leading to a kernel panic when the system attempted to kill init. The Java process appears to have a memory leak or misconfiguration that allowed it to consume excessive memory resources, starving other critical services.

Remediation Plan

1. Immediately restart the prod-web-03 server to restore basic functionality. 2. Once online, identify and terminate any runaway Java processes consuming excessive memory. 3. Investigate the Java application configuration for memory leaks, heap size limits, and garbage collection settings. 4. Review system memory allocation and consider adding swap space or increasing physical RAM. 5. Implement memory monitoring and alerting to prevent future OOM conditions. 6. Configure OOM killer priorities to protect critical system processes. 7. Review the 6 correlated incidents to determine if this is part of a broader infrastructure issue.

Tested: 2026-03-30Monitors: 2 | Incidents: 2Test ID: cmncjypu605s0obqeqkfweez7