Lessons Learned from the Global IT Outage | Cosmic Skip to main content

Lessons Learned from the Global IT Outage

Reflecting on recent events, Cosmic CEO Julie Hawker highlights the critical need for robust cybersecurity measures and effective contingency planning. "The recent global IT outage has shown us the importance of understanding our technology stacks and having solid recovery plans in place. It’s a wake-up call for all businesses to review and strengthen their systems to mitigate the risks of future incidents."

On Friday, 19th July 2024, a major IT outage struck across the world, crippling many key systems and bringing air travel, GP surgeries, retail, banking, broadcasting, and numerous other sectors to a standstill for hours. To understand the scale of the disruption, consider that 167 flights departing the UK and 171 incoming flights were cancelled on that day alone. In the United States, Delta Airlines cancelled more flights in the five days following the outage than it did in the entire years of 2018 and 2019 combined. In the UK, doctors' surgeries were unable to access patient records, affecting appointments, prescription renewals, and patient referrals. The impacts were widespread and severe.

A week on, the fallout continues with additional work needed to rectify issues and restore systems to full operation. While some systems were easier to reset with basic support, others required a synchronised and network-wide recovery plan. This extensive and ongoing effort highlights the magnitude of what has been described as the largest IT outage in history, costing Fortune 500 companies alone more than $5 billion in direct losses, according to an insurer's analysis of the incident.

At the heart of this global crisis was CrowdStrike, a leading provider of cybersecurity solutions to large and complex organisations. The failure was triggered by an update to CrowdStrike’s antivirus software, designed to protect Microsoft Windows devices from malicious attacks. Although the exact issue remains unclear, it only affected Windows PCs and was not a result of a security incident or cyber-attack. CrowdStrike has deployed a fix, but applying it to each affected device requires manual reboots in safe mode, posing significant challenges for IT departments.

For a world-leading cybersecurity provider to cause such a massive outage is a major irony and a wake-up call for all businesses. It underscores the need for companies to review their systems and ensure they understand which solutions and platforms are central to their technology stacks and how recovery can be achieved in the event of failure. Contingency planning in the digital age is as crucial as ever, and effective planning can mitigate the risks of business losses.

Interestingly, offering a $10 Uber gift voucher to staff at affected businesses as an apology backfired spectacularly. Within hours, CrowdStrike’s hotlines were overwhelmed again with reports that the Uber system had been overwhelmed and had gone into fail mode. It was an unexpected twist in an already dramatic series of events.

Lessons for Organisations

  1. Cybersecurity Plans and Systems: Prioritise robust cybersecurity measures, and regularly test and update the software you rely on to protect your systems.
  2. Software Updates: Ensure clarity about the processes associated with software updates. Run tests on a single device if there is any doubt or concern. Your IT team should have positive control over updates, knowing exactly what they contain and when they are applied. For instance, when patching Windows, it is crucial to understand what Microsoft is patching to identify potential issues. While automatic patching is convenient, the ability to turn it off when required is also important. In the case of CrowdStrike, the update was reportedly full of "garbage data," highlighting the need for rigorous testing.
  3. Staff Education: Ensure all staff receive regular updates on systems and how to respond to outages. Simulating tech outages can be highly valuable to ensure staff know how to respond and understand the "dos" and "don’ts" in such situations.
  4. Data Back-ups: Regular, remote backups across all systems are essential. Ensure you have a reliable backup plan and can mitigate the impact of outages. Additionally, conduct regular restoration tests to ensure backups can be restored efficiently. Many large organisations fail to do this adequately, resulting in delays and complications when restoration is urgently needed.

Input from Jonathan Allard, CISM-qualified IT specialist at Cosmic, reinforces these points. He notes that the system affected was called The Falcon Platform and highlights the absence of testing by CrowdStrike and the lack of a facility for customers to test or perform cascade, iterative patching. This is astonishing given the critical industries involved.

This incident serves as a crucial reminder for organisations to strengthen their cybersecurity frameworks, improve their update processes, and ensure their staff are well-prepared to handle unexpected outages. Regular testing and thorough contingency planning are key to mitigating the risks and impacts of IT failures in our increasingly digital world.


If you wish to discuss any concerns you have around Cybersecurity and building your system resilience, then speak to the team at Cosmic. We are Cyber Essentials Plus Certified and have a CISM qualified technician who can offer advice and support to your organisation. Get In Touch