Crowd-Striked: Lessons Learned and Best Practices for Future Prevention

Categories: Digital TransformationProject ManagementSecurityAutomotiveCommunicationsConsumer and RetailFinancial ServicesHealthcareManufacturing and IndustrialMediaTechnology

Incident Summary

On July 19, 2024, CrowdStrike released  a content configuration update for the Windows sensor that resulted in widespread system instability, causing Windows systems to experience the "Blue Screen of Death" (BSOD). The issue was traced to a channel file named “C-00000291*.sys” included in the update, which caused system crashes upon deployment. “Channel files” are part of the behavioral protection mechanisms used by the Falcon sensor, and are updated several times a day in response to new tactics, techniques, and procedures (TTPs) discovered by CrowdStrike.

This flaw had a significant impact on global operations, affecting critical infrastructures. CrowdStrike acted quickly to identify and revert the problematic update, but the disruption required extensive manual remediation across numerous systems.

Key Takeaways

This incident underscores the importance of rigorous pre-release testing and deployment protocols. The critical role of disaster recovery plans and efficient communication protocols are evident in managing and mitigating potential widespread disruptions.

Significant security risks, including the potential for vulnerability exploitation during periods of system instability, as well as increased risks associated with the operational disruption of critical infrastructures require an extensive and proactive mitigation approach. 

Best Practices

Best practices to mitigate unnecessary risks and avoid potential outages to critical infrastructure in the future include: 

Comprehensive Testing: Ensuring updates are thoroughly tested in varied environments before deployment can prevent issues. Testing should include not just functionality, but also performance and stress testing on different Windows configurations and versions.

Staged Rollouts: Gradually deploying updates in stages allows for monitoring and addressing issues before they affect a wider user base. This approach can help identify and rectify problems early in smaller, controlled groups.

Rollback Mechanisms: Implementing automated and efficient rollback mechanisms can quickly revert changes if issues are detected. This helps minimize downtime and disruption.

Monitoring and Analytics: Continuously monitoring the performance and behavior of updates through analytics can provide early warnings of potential issues, allowing for quicker intervention.

Communication and Transparency: Prompt and transparent communication with customers about potential issues and ongoing fixes helps manage the impact and maintains trust. CrowdStrike’s provision of remediation steps and communication about the issue were essential, but earlier and more proactive communication might have lessened the impact.

Disaster Recovery Plans: Having robust and well-practiced disaster recovery plans ensures that there are clear, effective procedures to follow in the event of a widespread issue. This includes having backups, failover systems, and clear communication channels.

Conclusion

Implementing these best practices, backed by GlobalLogic’s proven track record, can help organizations mitigate risks associated with software updates and ensure smoother, safer deployments. By focusing on comprehensive testing, staged rollouts, robust rollback mechanisms, continuous monitoring, effective communication, and well-developed disaster recovery plans, organizations can significantly reduce the likelihood of disruptions and enhance their overall risk management strategy.

Author

Kulbhushan

Author

Kulbhushan Bhardwaj

VP Engineering and Global Security Practice Head

View all Articles

Top Insights

Manchester City Scores Big with GlobalLogic

Manchester City Scores Big with GlobalLogic

AI and MLBig Data & AnalyticsCloudDigital TransformationExperience DesignMobilitySecurityMedia
Twitter users urged to trigger SARs against energy companies

Twitter users urged to trigger SARs against energy...

Big Data & AnalyticsDigital TransformationInnovation
Retail After COVID-19: How Innovation is Powering the New Normal

Retail After COVID-19: How Innovation is Powering the...

Digital TransformationInsightsConsumer and Retail

Top Authors

Sandeep Gill

Sandeep Gill

Consultant

Apurva Chaturvedi

Apurva Chaturvedi

Senior Manager

Neha Kukreja

Neha Kukreja

Consultant

Yuriy Yuzifovich

Yuriy Yuzifovich

Chief Technology Officer, AI

Top Insights Categories

  • URL copied!