
During 2025, several major stories dominated the enterprise IT space, including an increasing interest in AI agents, a growing disillusionment in generative AI, and the ongoing efforts by ERP vendor SAP to move customers to its cloud-based S/4HANA platform.
Like every year, 2025 also saw its share of major enterprise IT disasters, with service outages, failed deployments, and lawsuits targeting vendors. We recently published a list of AI disasters, so we’ll mostly avoid replaying them here, and we’ve largely ignored data breaches because a list of major cyberattacks from the year would number in the hundreds.
[ For past IT mishaps of note, see our biggest IT failure roundups from 2024, 2023, and 2021. ]
Just say please
In July, US cleaning product vendor Clorox filed a $380 million lawsuit against Cognizant, accusing the IT services provider’s helpdesk staff of handing over network passwords to cybercriminals who called and asked for them.
The 2023 scheme was a simple one, Clorox alleges. “Cognizant was not duped by any elaborate ploy or sophisticated hacking techniques,” according to the lawsuit. “The cybercriminal just called the Cognizant Service Desk, asked for credentials to access Clorox’s network, and Cognizant handed the credentials right over.”
Transcripts filed in the lawsuit show helpdesk workers providing passwords to callers who allegedly didn’t provide employee identification numbers, manager names, or any other verification.
The attack was attributed to the Scattered Spider, a cybercrime group that normally uses more sophisticated methods to target victims.
ERP gone bad
Zimmer Biomet, a medical device company, filed a $172 million lawsuit against Deloitte in September, accusing the IT consulting company of failing to deliver promised results in a large-scale SAP S/4HANA deployment.
The lawsuit accuses Deloitte of overstating its capabilities, busting the project budget, and pushing the ERP to go live in July 2024 before it was fully baked. The ERP system lacked major functionality through the third quarter of 2024, the lawsuit alleges, with Zimmer Biomet unable to use it to ship or receive products, prepare invoices, or generate basic sales reporting.
The mistakes cost more than $70 million to remediate, the lawsuit alleges, and Zimmer Biomet also wants to recover more than $100 million in payments it made to Deloitte. The consulting firm has called the allegations “meritless.”
[ For more ERP mishaps, see “18 famous ERP disasters, dustups, and disappointments” ]
Exploding batteries: Another thing to worry about
Many IT disasters can be traced back to buggy software or bad internal processes, but sometimes, the risk is more elemental.
In September, a massive fire at the National Information Resources Service (NIRS) government data center in South Korea resulted in the loss of 858TB of government data stored there.
The lost data was used by about 125,000 civil servants working on more than 160 public-facing services. One major problem is that the NIRS didn’t have a backup system, with officials saying the large amount of data made it impractical to replicate elsewhere.
Many government services, including tax filings and emergency services, were disrupted, with less than 18% of affected systems restored within a week.
The fire reportedly started during routine maintenance involving the relocation of lithium-ion batteries. About 40 minutes after workers disconnected the batteries, there was an explosion, leading to a fire that raged for about 22 hours and involved close to 200 firefighters. Dowsing the fire and limiting the damage was especially difficult because the batteries were located near servers.
In November, the director of the data center was relieved of duty after being accused of negligence.
The incident shows the importance of backups, notes Jana Sedivy, vice president of customer experience and product at network solutions provider InkBridge Networks.
“The takeaway is a reminder that ‘cloud storage’ just means ‘someone else’s computers,’” she says. “Cloud storage is good, but it’s a good idea for your backups to have backups. If those backups are not connected to your network, even better.”
Essential cloud services disappear, part 1
Multiple Google cloud services, including Gmail, Docs, Drive, Maps, and Gemini, were taken down during a massive outage in June. The outage was triggered by an earlier policy change to Google Service Control, a control plan service that provides functionality for managed services, with a null-pointer crash loop breaking APIs across several products.
The incident lasted more than seven hours and affected Google Cloud Services across several regions, including North America, Europe, the Far East, and Africa. The outage affected several Web-based products that depend on Google, including Spotify, Snapchat, and Discord, as well as several Cloudflare services.
Even though Google’s Site Reliability Engineering team began to triage the incident within two minutes, the overall fix took much longer. Some regions began to see recovery within 40 minutes of the incident, but recovery in large regions took longer.
“Within some of our larger regions, as Service Control tasks restarted, it created a herd effect on the underlying infrastructure it depends on, overloading the infrastructure,” Google engineers wrote.
Google promised to modularize Service Control’s architecture and isolate potential problems. The company also pledged to do better going forward.
Essential cloud services disappear, part 2
In late October, Amazon Web Services’ US-EAST-1 region was hit with a significant outage, lasting about three hours during early morning hours. The problem was related to DNS resolution of the DynamoDB API endpoint in the region, causing increased error rates, latency, and new instance launch failures for multiple AWS services.
While AWS released a detailed postmortem report, some observers were uncomfortable with the company’s assurances that it had fixed the problem going forward. Some experts worry about the growing dependence on hyperscalers when their services are cobbled together from technologies created decades ago.
Other observers note that AWS didn’t explain exactly why the outage happened.
The AWS outage demonstrates the need for IT leaders to diversify their cloud use, says Jake Madders, cofounder and director of Hyve Managed Hosting.
“The AWS incident is a stark reminder that even the largest and most reliable cloud providers can experience significant outages, but these risks can be mitigated,” he says. “Diversifying across multiple cloud providers and geographic regions is essential to ensure redundancy and enable seamless failover when disruption occurs.”
Essential cloud services disappear, part 3 (and 4?)
Not to be left out, Microsoft’s Azure cloud services experienced two outages in 2025.
In late July, services in Microsoft’s Azure East US region were disrupted, with customers experiencing allocation failures when trying to create or update virtual machines. The problem? A lack of capacity, with a surge in demand outstripping Microsoft’s computing resources.
Microsoft reported the problem resolved by Aug. 5, but some users complained of ongoing problems days later.
Then, in late October, Azure went down again, this time affecting its 365, Xbox, and Minecraft product lines, as well as websites run by Costco, Starbucks, and other businesses.
Microsoft blamed an inadvertent configuration change for triggering the problem. Nearly 20,000 Microsoft 365 customers had reported issues to the company, and a handful were still affected more than 10 hours after initial reports of the outage.
Essential internet services disappear
Cloudflare isn’t a cloud hyperscaler, unlike the examples above, but it does provide essential internet infrastructure functions, including content delivery network services, DDoS mitigation, and domain registration.
It also isn’t immune to its own outages, some of them connected to cloud hyperscaler problems, as seen above. A Nov. 18 outage, due to a latent bug triggered by a routine configuration change, led to problems at several major websites, including Spotify, X, and ChatGPT. The bug led to a broad degradation of the company’s network and other services for about two hours.
Cloudflare CTO Dane Knecht apologized for the outage and said the company is taking steps to make sure it doesn’t happen again. He acknowledged that the outage “caused real pain.”

