July 19, 2024 will be remembered as the day the world collectively held its breath, waiting to see the home screen of their Windows machines.
An unfortunate technical conflict between the two titans of technology: CrowdStrike and Windows, unintentionally set a world-wide reboot spin, taking down much of the globe's crucial operations with it. Planes grounded, businesses paralysed, markets frozen and frontline duties disrupted.
Historically, only targeted cyberattacks have exhibited the capacity to induce disruptions on such a large scale. This one stands as a unique outage, that was born not from malice but from a well-intentioned security measure. While we hope CrowdStrike recovers gracefully, it is notable that the digital infrastructure, while undeniably powerful, remains fragile enough to crumble under the weight of a single, faulty file.
What caused the outage?
A recent update to CrowdStrike's Falcon Sensor caused a conflict with Microsoft Windows, causing machines to enter a 'blue screen of death' or endlessly reboot.
Initially suspected to be a Microsoft issue, the root cause of the problem was identified to be a corrupted configuration file of CrowdStrike's Falcon Sensor (04:09 UTC). This issue has three key components of interest. First, the Falcon Sensor, the core agent installed on endpoints, collects telemetry data for CrowdStrike and performs other crucial tasks. Second, the Falcon Driver, which is a kernal-mode driver that operates at the heart of the OS, with elevated privileges for performing security operations. It loads during the pre-OS initialization stage, ensuring it is among the first to be initialized, allowing it to protect the system from the very start. Third, the configuration file (referred to as the channel file by CrowdStrike) acts as an intermediary channel for data exchange between the sensor and the driver. It outlines the sensor's rules and actions for monitoring system activity. For instance, when the sensor identifies malicious activity, it uses the configuration file to request a closer investigation by the driver. The driver then intercepts necessary system-level operations and reports back to the sensor, which analyzes the data and takes a call accordingly.
Now, Instead of pushing a full software update each time, CrowdStrike updates this configuration file multiple times a day with newly emerging threat identifiers. Unfortunately, during one such update, a faulty file update triggered a logic error, leading to a memory allocation failure and subsequent driver crash with a PAGE_FAULT_IN_NONPAGED_AREA error. Windows detected this and initiated a BSOD to prevent further damage to the system. Since the update was deployed via CrowdStrike's cloud infrastructure, it was automatically installed in a large number of Windows machines, before the issue was realised.
We empathize with the affected organizations, their customers and employees. We are doing our best to help in any way we can.
BSOD resolver tool
Recognizing the strain on the IT teams, we've created a tool that automatically applies the Crowdstrike's workaround, removes the problematic update, and restarts your system.
Who can use it?
Anyone can use it! It works for both Endpoint Central customers and non-customers.
How will it help you?
It is designed to work even when Windows Recovery Environment or Safe Mode is not functioning on a machine. It supports both BitLocker enabled & disabled machines.
Note: Rest assured, this tool has undergone extensive testing. We've taken great care to not add any extra burden on you.
How does it work?
Step 1: Create a recovery media
Pre-requisite: Ensure the Windows ADK tool is installed on your machine before generating the recovery media. (Existing Endpoint Central's OSD users already have it.)
- Unzip the bsod-resolver.zip file
- Execute the GeneratePEImage.bat file with 'administrator privileges'. This will create the recovery media in both ISO and USB format, inside the images folder.
- To copy the ISO file onto a USB drive, go to images\crowdstrike\media and run USBCopy.bat with administrative privileges. For VMs, you can use the ISO directly. Refer to the how-to videos below.
Step 2: Fix the BSOD using the recovery media:
- Boot your machine using the recovery media.
- If the OS partition is Bitlocker-protected, have your recovery key ready. Otherwise, skip this step. (See how you can directly recover the BitLocker key using Endpoint Central further in this document)
- Enter the recovery key (if needed) and click "Confirm" to fix the issue.
- Remove the recovery media and boot your machine normally.
The above step will work even if Windows Recovery Environment or Safe Mode is not functioning. If you are already in Windows Recovery Environment:
- Copy the recovery tool to a USB drive.
- Extract the kit on the USB drive.
- Connect the USB drive to your computer.
- Open a command prompt in Recovery Environment.
- Navigate to the bsod-resolver.zip folder on the USB drive.
- Run the 'recoveryEnv.bat' file to start the repair.
Please use the below how-to videos for reference:
If you have any questions about this tool, please reach us at endpointcentral-support@manageengine.com.
Crowdstrike's official workaround
- Boot Windows into Safe Mode or the Windows Recovery Environment
- Navigate to the C:\Windows\System32\drivers\CrowdStrike directory
- Locate the file matching “C-00000291*.sys”, and delete it.
- Boot the host normally.
The challenge: Devices that are encrypted with BitLocker require a recovery key to enter Safe Mode. It can be retrieved only if it is locally stored already. Without this key, using the CrowdStrike workaround becomes impossible. Given our shared customer base with CrowdStrike, here are some remedial measures you can perform via Endpoint Central console.
Get recovery key directly from Endpoint Central
Endpoint Central allows you to retrieve the key directly from the console, after which you can follow the CrowdStrike workaround.
- Login to Endpoint Central console.
- Go to Inventory > Computers > Select required machine > Security Tab > BitLocker.
- Click "Available" under C Drive in the 'Recovery Key Status' tab to get the Recovery Key.
Deploy a PowerShell script from Endpoint Central
The script will handle booting into safe mode, changing the registry key, and rebooting into normal mode. However, since BitLocker is enabled, you’ll need to ensure you have the recovery key.
# CrowdStrikeFix.ps1
# This script deletes the problematic CrowdStrike driver file causing BSODs and reverts Safe Mode
$filePath = "C:\Windows\System32\drivers\CrowdStrike\C-00000291*.sys"
$files = Get-ChildItem -Path $filePath -ErrorAction SilentlyContinue
foreach ($file in $files) {
try {
Remove-Item -Path $file.FullName -Force
Write-Output "Deleted: $($file.FullName)"
} catch {
Write-Output "Failed to delete: $($file.FullName)"
}
}
# Revert Safe Mode Boot after Fix
bcdedit /deletevalue {current} safeboot
Restart-Computer -Force
*Script provided by CrowdStrike.
How to prevent BSODs?
No one is immune to tech glitches, because they will continue to exist as long as technology exists. However, it is possible to identify and address underlying issues before they escalate into full-blown BSOD incidents.
- Continuously monitor spikes in hardware performance indicators, application crashes and potential conflicts between different elements.
- Closely scrutinise patches of operating systems, drivers, and third-party applications. Test them for weeks before rolling out.
- Perform a pre-deployment check to analyse the compatibility/stability issues before each update.
- Configure automated reversion in case of a critical failure.
- Set up alerts for unusual activities or signs of system failure.
- Diversify OSes in your network to protect against failures affecting a single OS
- Schedule regular maintenance tasks such as disk defragmentation, system cleanup, and registry checks.
You may achieve these through Endpoint Central, which brings together endpoint lifecycle management, AI-assisted threat detection & remediation, and experience monitoring into one single platform. This enables better data flow, streamlined issue identification, and coordinated remediation - all from the same place.
Beware of phishing attempts
Unfortunately, this incident is being exploited by cybercriminals for launching social engineering and phishing attacks with fake emails or messages offering support or updates related to the outage. Please exercise caution.
As a takeaway, like cybersecurity, organizations must also prepare for productivity disruptions. Having a realistic and robust incident response plan can help minimize the impact of unexpected outages.