We test Corax against real-world infrastructure failures across every vendor, platform, and scenario. Browse the results below.
The internal package repository mirror becomes corrupted after a failed rsync, causing all package installation and update operations across the Linux fleet to fail. Servers cannot install security patches or new application dependencies.
An automated SSH key rotation process replaces host keys on 20 servers but fails to update the known_hosts files on the Ansible control node and monitoring servers. All SSH-based automation, configuration management, and monitoring breaks simultaneously.
The cron daemon on a critical infrastructure server crashes and enters a crash loop due to a corrupted crontab file. All scheduled jobs including log rotation, backup scripts, and health checks stop executing. The issue goes undetected for 48 hours.
SELinux enforcing mode blocks a newly deployed application from binding to its configured port and accessing its data directory. The application fails to start with permission denied errors, and the audit log fills with AVC denial messages.
An NFS mount on a production application server becomes stale after the NFS server becomes unreachable due to a network switch failure. All processes attempting to access the NFS mount hang in uninterruptible sleep, and the application server becomes partially unresponsive.
A Linux server runs out of inodes on the root filesystem despite having 40% disk space free. Millions of small session files created by a PHP application consumed all available inodes. No new files can be created, causing application failures and log rotation to break.
A misconfigured systemd unit file creates a circular dependency between three services, causing them to enter a restart loop. Each service depends on another in the cycle, and systemd cannot satisfy all dependencies simultaneously.
A Linux production server experiences kernel panic after the OOM killer is invoked repeatedly, killing critical processes including the database and web server. The system becomes unresponsive after the OOM killer targets the init process.
A critical scheduled task that triggers a chain of dependent tasks fails silently. The initial task (database export) fails due to credential expiry, causing downstream tasks (report generation, file transfer, client notification) to all fail in sequence.
The WSUS server fails to synchronize with Microsoft Update for 14 days due to a corrupt content database. All managed workstations and servers are missing critical security patches, creating a significant vulnerability window.
The Windows DNS server's conditional forwarder for a partner domain stops resolving after the partner changes their DNS server IPs. All lookups for the partner domain fail, breaking the federated application integration.
The Local Security Authority Subsystem Service (LSASS) on a domain controller develops a memory leak after a security update, consuming increasing amounts of RAM until the server becomes unresponsive. Authentication requests fail as memory pressure increases.
The RDP certificate on a critical terminal server has expired, preventing all remote desktop connections. Users receive certificate warnings and connections are rejected by Group Policy enforcing NLA.
A Windows Defender signature update incorrectly identifies a critical production DLL as malware and quarantines it. The affected application fails to start, impacting all users of the ERP system.
The IIS application pool for a critical internal web application enters a crash loop after a .NET runtime update. The worker process (w3wp.exe) crashes within seconds of starting, and IIS rapid-fail protection disables the pool after 5 crashes in 5 minutes.
A 3-node Windows Server Failover Cluster loses quorum when two nodes simultaneously fail due to a shared storage controller issue. The remaining node cannot form quorum alone, and all clustered services go offline including SQL Server Always On Availability Groups.
A production Windows Server experiences repeated Blue Screen of Death (BSOD) with IRQL_NOT_LESS_OR_EQUAL stop code, generating core dumps. The server crashes every 10-15 minutes after boot, caused by a faulty network driver update.
A cumulative Windows Update (KB5034441) fails to install on a production file server, causing a crash loop where the server continuously reboots attempting to apply the update. The server never reaches a healthy state, and all SMB shares are offline.
The NTP server goes offline and server clocks begin drifting. After 3 hours, the domain controller clock is 7 minutes ahead of workstations. Kerberos authentication fails because the maximum clock skew tolerance is 5 minutes. Users cannot log in, access file shares, or use any domain-authenticated service.
A poorly optimized batch job triggers a deadlock storm in SQL Server. 150+ deadlocks in 10 minutes as the batch process and OLTP transactions compete for the same table locks. Application transactions are chosen as deadlock victims, causing user-facing errors.
Every scenario is tested against Corax's Neural Engine in a production environment with AI-powered root cause analysis.
Tests run continuously as new infrastructure patterns are added.