Back to All Scenarios
PASSEDcloud / azure_sql

Azure: Azure SQL Geo-Replication Lag — Failover Group Delayed

Azure SQL geo-replication is lagging by 15 minutes due to high transaction volume. The failover group RPO guarantee is being violated.

Pattern
DATABASE_EVENT
Expected: AZURE_REPLICATION_LAG
Severity
HIGH
Confidence
68%
Remediation
Auto-Heal

Test Results

MetricExpectedActualResult
Pattern RecognitionAZURE_REPLICATION_LAGDATABASE_EVENT
Severity AssessmentHIGHHIGH
Incident CorrelationN/ANone
Cascade EscalationN/ANo
RemediationAuto-Heal — Corax resolves autonomously

Scenario Conditions

Azure SQL Database with geo-replication to secondary region. Transaction log shipping lagging 15 minutes. Replication health: WARNING. RPO target: 5 seconds.

Injected Error Messages (1)

Azure SQL geo-replication lag — primary-to-secondary replication 15 minutes behind, transaction volume exceeding replication throughput, RPO violated (target: 5s, actual: 15min), failover would lose data

Neural Engine Root Cause Analysis

Database infrastructure event detected — the connection pool may be exhausted preventing new connections, replication lag is growing between primary and replica, deadlocks are occurring between competing transactions, or slow queries are degrading overall database performance. Database issues cascade to affect all applications and services that depend on the database.

Remediation Plan

1. For connection pool exhaustion, check current connections with 'SHOW PROCESSLIST' or 'pg_stat_activity' and identify idle/stuck connections. 2. For replication lag, check replica I/O and SQL thread status and identify long-running transactions on the primary. 3. For deadlocks, review the deadlock graph (InnoDB: 'SHOW ENGINE INNODB STATUS', Postgres: check pg_locks) and optimize transaction ordering. 4. For slow queries, enable and review the slow query log, add missing indexes, and optimize query plans with EXPLAIN. 5. Consider scaling read replicas or implementing connection pooling (PgBouncer/ProxySQL) if connection limits are consistently hit.

Improvements Applied

  • Pattern classified as DATABASE_EVENT (expected AZURE_REPLICATION_LAG)
Tested: 2026-04-02Monitors: 1 | Incidents: 1Test ID: cmnhnr8vc09bplig7xpfksu2q