A ZFS storage pool enters degraded state after a drive failure in a RAIDZ2 vdev. The pool remains operational but with reduced redundancy. A second drive in the same vdev is showing SMART warnings, indicating imminent failure.
Pattern
PERFORMANCE_DEGRADATION
Severity
CRITICAL
Confidence
95%
Remediation
Remote Hands
Test Results
Metric
Expected
Actual
Result
Pattern Recognition
PERFORMANCE_DEGRADATION
PERFORMANCE_DEGRADATION
Severity Assessment
CRITICAL
CRITICAL
Incident Correlation
Yes
5 linked
Cascade Escalation
N/A
No
Remediation
—
Remote Hands — Corax contacts on-site support via call, email, or API
Scenario Conditions
ZFS pool 'datapool' with 3x RAIDZ2 vdevs (6 drives each). Drive sd-e3 failed in vdev-1. Drive sd-e5 in same vdev showing SMART reallocated sector warnings. Pool degraded but operational. Resilver operation in progress on hot spare.
Injected Error Messages (1)
ZFS pool 'datapool' degraded — drive sd-e3 in vdev-1 FAULTED with too many errors, pool running in degraded state with reduced redundancy, resilver in progress on hot spare: 23% complete, WARNING: drive sd-e5 in same vdev showing 847 reallocated sectors in SMART data, second failure would result in data loss for vdev-1, zpool status showing DEGRADED state
Neural Engine Root Cause Analysis
The ZFS storage server has experienced a hardware drive failure where drive sd-e3 in vdev-1 has FAULTED due to excessive errors, causing the ZFS pool 'datapool' to enter a degraded state. While a resilver operation is in progress on a hot spare (23% complete), there is a critical secondary risk as drive sd-e5 in the same vdev shows 847 reallocated sectors, indicating imminent failure. If sd-e5 fails before the resilver completes, vdev-1 would lose all data as ZFS requires at least one healthy drive per vdev for data integrity.
Remediation Plan
1. Immediately monitor the resilver progress and ensure it completes successfully. 2. Once resilver is complete, physically replace the faulted drive sd-e3. 3. Add the new drive back to the pool to restore redundancy. 4. Urgently replace drive sd-e5 due to high reallocated sector count indicating imminent failure. 5. Monitor SMART data on all remaining drives in the pool. 6. Consider adding additional hot spares to prevent future single points of failure. 7. Implement proactive SMART monitoring alerts to catch drive degradation earlier.