Crowd-Striked: Lessons Learned and Best Practices for Future Prevention

Categories: Digital TransformationProject ManagementSecurityAutomotiveCommunicationsConsumer and RetailFinancial ServicesHealthcareManufacturing and IndustrialMediaTechnology

Incident Summary

On July 19, 2024, CrowdStrike released an update to its Falcon platform that resulted in widespread system instability, specifically causing Windows systems to experience the "Blue Screen of Death" (BSOD). The issue was traced to a channel file named “C-00000291*.sys” included in the update, which caused system crashes upon deployment. “Channel files” are part of the behavioral protection mechanisms used by the Falcon sensor, and are updated several times a day in response to new tactics, techniques, and procedures (TTPs) discovered by CrowdStrike.

This flaw had a significant impact on global operations, affecting critical infrastructures such as airports, hospitals, and news outlets. Although CrowdStrike acted quickly to identify and revert the problematic update, the disruption required extensive manual remediation across numerous systems.

Key Takeaways

This incident first and foremost underscores the importance of rigorous pre-release testing and deployment protocols. Moreover, the critical role of disaster recovery plans and efficient communication protocols are evident in managing and mitigating such widespread disruptions.

This incident also highlights significant security risks, including the potential for vulnerability exploitation during periods of system instability, as well as increased risks associated with the operational disruption of critical infrastructures like airports and hospitals.

Best Practices

We will not discuss here the technical remediation plan for the incident, but rather use GlobalLogic’s extensive experience and expertise in software development and deployment to suggest some best practices companies should consider to mitigate unnecessary risks and avoid such incidents in the future:

Comprehensive Testing: Ensuring updates are thoroughly tested in varied environments before deployment can prevent issues. Testing should include not just functionality, but also performance and stress testing on different Windows configurations and versions.

Staged Rollouts: Gradually deploying updates in stages allows for monitoring and addressing issues before they affect a wider user base. This approach can help identify and rectify problems early in smaller, controlled groups.

Rollback Mechanisms: Implementing automated and efficient rollback mechanisms can quickly revert changes if issues are detected. This helps minimize downtime and disruption.

Monitoring and Analytics: Continuously monitoring the performance and behavior of updates through analytics can provide early warnings of potential issues, allowing for quicker intervention.

Communication and Transparency: Prompt and transparent communication with customers about potential issues and ongoing fixes helps manage the impact and maintains trust. CrowdStrike’s provision of remediation steps and communication about the issue were essential, but earlier and more proactive communication might have lessened the impact.

Disaster Recovery Plans: Having robust and well-practiced disaster recovery plans ensures that there are clear, effective procedures to follow in the event of a widespread issue. This includes having backups, failover systems, and clear communication channels.

Conclusion

Implementing these best practices, backed by GlobalLogic’s proven track record, can help organizations mitigate risks associated with software updates and ensure smoother, safer deployments. By focusing on comprehensive testing, staged rollouts, robust rollback mechanisms, continuous monitoring, effective communication, and well-developed disaster recovery plans, organizations can significantly reduce the likelihood of disruptions and enhance their overall risk management strategy.

Author

Kulbhushan

Author

Kulbhushan Bhardwaj

VP Engineering and Global Security Practice Head

View all Articles

Top Insights

Manchester City Scores Big with GlobalLogic

Manchester City Scores Big with GlobalLogic

AI and MLBig Data & AnalyticsCloudDigital TransformationExperience DesignMobilitySecurityMedia
Twitter users urged to trigger SARs against energy companies

Twitter users urged to trigger SARs against energy...

Big Data & AnalyticsDigital TransformationInnovation
Retail After COVID-19: How Innovation is Powering the New Normal

Retail After COVID-19: How Innovation is Powering the...

Digital TransformationInsightsConsumer and Retail

Top Authors

Amit Handoo

Amit Handoo

Vice President, Client Engagement

Chet Kolley

Chet Kolley

SVP & GM, Medical Technology BU

Ravikrishna Yallapragada

Ravikrishna Yallapragada

AVP, Engineering

Mark Norkin

Mark Norkin

Consultant, Engineering

Sujatha Malik

Sujatha Malik

Principal Architect

Top Insights Categories

  • URL copied!