Crowd-Striked: Lessons Learned and Best Practices for Future Prevention

Insight categories: Digital TransformationProject ManagementSecurityAutomotiveCommunicationsConsumer and RetailFinancial ServicesHealthcareManufacturing and IndustrialMediaTechnology

Incident Summary

On July 19, 2024, CrowdStrike released  a content configuration update for the Windows sensor that resulted in widespread system instability, causing Windows systems to experience the "Blue Screen of Death" (BSOD). The issue was traced to a channel file named “C-00000291*.sys” included in the update, which caused system crashes upon deployment. “Channel files” are part of the behavioral protection mechanisms used by the Falcon sensor, and are updated several times a day in response to new tactics, techniques, and procedures (TTPs) discovered by CrowdStrike.

This flaw had a significant impact on global operations, affecting critical infrastructures. CrowdStrike acted quickly to identify and revert the problematic update, but the disruption required extensive manual remediation across numerous systems.

Key Takeaways

This incident underscores the importance of rigorous pre-release testing and deployment protocols. The critical role of disaster recovery plans and efficient communication protocols are evident in managing and mitigating potential widespread disruptions.

Significant security risks, including the potential for vulnerability exploitation during periods of system instability, as well as increased risks associated with the operational disruption of critical infrastructures require an extensive and proactive mitigation approach. 

Best Practices

Best practices to mitigate unnecessary risks and avoid potential outages to critical infrastructure in the future include: 

Comprehensive Testing: Ensuring updates are thoroughly tested in varied environments before deployment can prevent issues. Testing should include not just functionality, but also performance and stress testing on different Windows configurations and versions.

Staged Rollouts: Gradually deploying updates in stages allows for monitoring and addressing issues before they affect a wider user base. This approach can help identify and rectify problems early in smaller, controlled groups.

Rollback Mechanisms: Implementing automated and efficient rollback mechanisms can quickly revert changes if issues are detected. This helps minimize downtime and disruption.

Monitoring and Analytics: Continuously monitoring the performance and behavior of updates through analytics can provide early warnings of potential issues, allowing for quicker intervention.

Communication and Transparency: Prompt and transparent communication with customers about potential issues and ongoing fixes helps manage the impact and maintains trust. CrowdStrike’s provision of remediation steps and communication about the issue were essential, but earlier and more proactive communication might have lessened the impact.

Disaster Recovery Plans: Having robust and well-practiced disaster recovery plans ensures that there are clear, effective procedures to follow in the event of a widespread issue. This includes having backups, failover systems, and clear communication channels.

Conclusion

Implementing these best practices, backed by GlobalLogic’s proven track record, can help organizations mitigate risks associated with software updates and ensure smoother, safer deployments. By focusing on comprehensive testing, staged rollouts, robust rollback mechanisms, continuous monitoring, effective communication, and well-developed disaster recovery plans, organizations can significantly reduce the likelihood of disruptions and enhance their overall risk management strategy.

Author

Kulbhushan

Author

Kulbhushan Bhardwaj

VP Engineering and Global Security Practice Head

View all Articles

Trending Insights

If You Build Products, You Should Be Using Digital Twins

If You Build Products, You Should Be Using...

Digital TransformationTesting and QAManufacturing and Industrial
Empowering Teams with Agile Product-Oriented Delivery, Step By Step

Empowering Teams with Agile Product-Oriented Delivery, Step By...

AgileProject ManagementAutomotiveCommunicationsConsumer and RetailMedia

Top Authors

Yuriy Yuzifovich

Yuriy Yuzifovich

Chief Technology Officer, AI

Richard Lett

Richard Lett

VP of Healthcare Technology

Amit Handoo

Amit Handoo

Vice President, Client Engagement

Ravikrishna Yallapragada

Ravikrishna Yallapragada

AVP, Engineering

Lavanya Mandavilli

Lavanya Mandavilli

Principal Technical Writer

All Categories

  • URL copied!