An HPE Nimble AF60 all-flash array fails to replicate to the DR site for 12 hours after a WAN circuit issue causes the replication stream to abort repeatedly, creating a growing RPO gap.
Pattern
UNKNOWN
Severity
CRITICAL
Confidence
85%
Remediation
Remote Hands
Test Results
Metric
Expected
Actual
Result
Pattern Recognition
UNKNOWN
UNKNOWN
Severity Assessment
CRITICAL
CRITICAL
Incident Correlation
Yes
20 linked
Cascade Escalation
N/A
No
Remediation
—
Remote Hands — Corax contacts on-site support via call, email, or API
Scenario Conditions
HPE Nimble AF60 primary at HQ. HPE Nimble AF40 replica at DR site. WAN circuit between sites experiencing 30% packet loss. Replication stream aborting every 15 minutes. RPO gap: 12 hours and growing. 50TB of production data.
Injected Error Messages (2)
hpe Nimble AF60 replication failure — replication to DR partner nimble-af40-dr has been failing for 12 hours, hpe Nimble replication stream aborting with 'network transmission error' every 15 minutes, RPO gap: 12 hours (SLA: 1 hour), 50TB of production data not replicated to disaster recovery site, hpe Nimble InfoSight predicting WAN link as root cause
hpe Nimble storage DR replication critically behind — last successful snapshot replication: 12 hours ago, hpe Nimble replication backlog: 2.1TB of changed blocks pending transfer, WAN circuit to DR site experiencing 30% packet loss causing TCP retransmission storms on replication port, hpe Nimble replication schedule: every 15 minutes (all recent attempts failed)
Neural Engine Root Cause Analysis
HPE Nimble AF60 replication to DR partner nimble-af40-dr is failing due to WAN link issues, as predicted by HPE Nimble InfoSight. The replication stream is consistently aborting with 'network transmission error' every 15 minutes for 12 hours, creating a critical 12-hour RPO gap that exceeds the 1-hour SLA. This appears to be a network connectivity or bandwidth issue between the primary and DR sites affecting the WAN link used for replication traffic.
Remediation Plan
1. Immediately verify WAN link status and connectivity between primary site (10.50.4.100) and DR site nimble-af40-dr. 2. Check network bandwidth utilization and latency on the replication path. 3. Verify firewall rules and network ACLs for Nimble replication traffic. 4. Test basic connectivity (ping, traceroute) between storage arrays. 5. Review network infrastructure logs for errors, drops, or congestion. 6. If WAN link is degraded, engage network team to restore connectivity or implement temporary bandwidth prioritization for replication traffic. 7. Once network is stable, manually trigger replication resync to close the RPO gap. 8. Monitor replication status until normal 1-hour RPO is restored.