AWS: CloudWatch Log Group Retention Causing Cost Spike
A CloudWatch Log Group has 'never expire' retention and has accumulated 5TB of logs. Monthly cost jumped from $50 to $2,500.
Pattern
AWS_CLOUD
Expected: AWS_COST_ANOMALY
Severity
HIGH
Confidence
68%
Remediation
Auto-Heal
Test Results
Metric
Expected
Actual
Result
Pattern Recognition
AWS_COST_ANOMALY
AWS_CLOUD
Severity Assessment
HIGH
HIGH
Incident Correlation
N/A
None
Cascade Escalation
N/A
No
Remediation
—
Auto-Heal — Corax resolves autonomously
Scenario Conditions
AWS CloudWatch Log Group '/aws/lambda/data-processor'. Retention: Never Expire. Size: 5TB. Monthly cost: $2,500. Lambda invoked 1M times/day with verbose logging.
Injected Error Messages (1)
AWS CloudWatch cost spike — log group '/aws/lambda/data-processor' at 5TB with no retention policy, monthly cost $2,500 (was $50), verbose Lambda logging, retention set to 'Never Expire'
Neural Engine Root Cause Analysis
AWS cloud infrastructure event detected — an EC2 instance may be unreachable or in a stopped state, an RDS database is experiencing issues, a load balancer has unhealthy targets, or a Lambda function is failing. AWS service disruptions can cascade across dependent resources and affect application availability.
Remediation Plan
1. Check the AWS Health Dashboard and Personal Health Dashboard for any active service events.
2. For EC2 issues, check instance status checks (system and instance), review CloudWatch metrics, and check VPC security group rules.
3. For RDS, verify database instance status, check storage and connection limits, and review slow query logs.
4. For ELB issues, check target group health checks and verify backend instances are responding.
5. For Lambda, review CloudWatch Logs for invocation errors and check IAM permissions and VPC connectivity.
Improvements Applied
Pattern classified as AWS_CLOUD (expected AWS_COST_ANOMALY)