Tech outages are overwhelmingly down to small errors such as software bugs and configuration issues rather than outside attacks, yet companies continue to make the same mistakes, research by Website Planet has shown.
Researchers at the platform analysed 184 major tech failures dating back to 1961 to unpick root causes and common oversights, and tot up the financial cost.
Software bugs and logic errors were the biggest cause of incidents, accounting for 38%, while configuration and deployment errors were behind 16%.
Just under one in ten were caused by data and database errors, with infrastructure and hardware failures accounting for 8%, and resource exhaustion causing 7%.
This left security breaches and attacks causing 18% of failures, while denial of service accounted for 2.2%.
Bugs and logic errors have accounted for a cumulative $65bn in losses across those incidents where losses were made public, the researchers found. Configuration and deployment errors caused $32.15bn of losses, with security breaches accounting for $29.44bn.
The researchers tracked just three outages in the 60s, 2 in the 1970s, six in the 1980s and 10 in the 1990s. The 2010s saw 61 major outages, while halfway though the 2020s, we’re already at 82 major outages.
It’s arguable that the chances of a cyberattack causing a major outage in the 60s, and even in the 90s were minimal. Security breaches became a factor in the 1980s, accounting for half of the outages tracked.
But even in the 2020s security breaches and attacks have accounted for 23% of outages. The majority are still down to software bugs and logic errors (29%) and configuration and deployment errors (24%). Or in other words, the tech might have changed, the reasons it screws up remains the same.
The researchers pointed out that the rise in security issues “reflects not just an increase in threats, but a broader range of vulnerabilities, like exposed APIs and misconfigured cloud settings.”
Ultimately, modern outages might look complex, but the root causes are overwhelmingly “simple, preventable mistakes” the researchers said.
“After analysing the last few decades of high-impact failures, we can say that companies have improved at responding to outages, but not necessarily at preventing them.”
The key to avoiding these sort of outages is not creating better software, they suggested, but building stronger processes.

 
