CrowdStrike has issued the final Root Cause Analysis (RCA) report outlining the events that led to the worldwide IT outage on July 19, 2024. The primary cause of the problem was a discrepancy in input parameters between the Windows interprocess communication (IPC) template type and the integration code in the Falcon software.
The trouble originated in February when developers incorporated new detection capabilities into Falcon. This update aimed to enhance the software’s ability to identify and block novel exploitation attempts using named pipes and other Windows IPC mechanisms.
These mechanisms are critical for indicating potential system compromises.
Following standard development and testing procedures, the new functionality was integrated as a ‘template type’ in sensor version 7.11 of Falcon. Template types in Falcon are generalised software routines designed to detect specific types of suspicious activities. These are then customised through ‘template instances,” which define how the template identifies particular threats.
According to CrowdStrike, the architecture is designed to allow dynamic configuration of the templates’ runtime behaviour using these instances. Since March, several template instances leveraging the new IPC template type have been deployed from the cloud to Falcon installations globally.
These updates were stored in channel file 291 and were intended to be parsed by Falcon to instruct the software on threat detection.
The root of the problem lay in a mismatch between the number of input parameters defined by the IPC template type and those provided by the integration code.
The IPC template type required 21 input parameters, but the code only supplied 20. This discrepancy went unnoticed through multiple layers of validation and testing, including sensor release testing and initial deployments.
The critical failure occurred on July 19 when two new IPC template instances were deployed. One of these instances required the use of the 21st parameter, which was missing. This caused the content interpreter running in Windows kernel mode to reference uninitialised memory, leading to system crashes.
“Sensors that received the new version of Channel File 291 carrying the problematic content were exposed to a latent out-of-bounds read issue in the Content Interpreter. At the next IPC notification from the operating system, the new IPC Template Instances were evaluated, specifying a comparison against the 21st input value,” CrowdStrike explained. “The Content Interpreter expected only 20 values. Therefore, the attempt to access the 21st value produced an out-of-bounds memory read beyond the end of the input data array and resulted in a system crash.”
CrowdStrike has since updated its sensor content compiler to ensure proper parameter matching for future template types. Additionally, runtime bounds checking has been added to prevent out-of-bounds memory reads. These fixes are being retrofitted into all affected Windows sensor versions, with a hotfix scheduled for general release by August 9.
The company also enhances its internal testing processes to prevent flawed updates from reaching customers. Moving forward, all template instances will be deployed in a phased manner to minimise potential impacts from future errors.
In the News: Apple tightens Gatekeeper protection in macOS Sequoia