When the world woke up on Friday, July 19th - pigs seemed to be flying. For the second time in 2 years, planes were grounded, IT systems were nonfunctioning, and Fortune-500 corporations couldn't access their data...all they could access was the infamous Microsoft "blue screen of death". And people were panicking. What is now considered one of the largest IT outages in history was triggered by a faulty software update from security company, CrowdStrike, affecting millions of systems around the world - including a few of our customers.
The Problem: Loss of Microsoft Azure Access
At Centre, we pride ourselves on being able to provide and manage the best solutions that the IT industry has to offer. But even more than that, we promise those solutions come with the best incident response to keep you online whenever an outage occurs. Unfortunately, due to the unexpected and widely influential CrowdStrike outage, our quality solution, Microsoft Azure, lost connectivity and our team had to act fast to get customers back online. The only problem was, this outage occurred from a nation-wide issue, one that would require more than just our technicians to fix.
Nevertheless, our team went to work.
In total, 6 of our customers lost access to their cloud-based Microsoft Azure desktop systems. Customers experienced issues with multiple Azure services including failures with service management operations and availability of services.
Impacts to Business Operations
As the outage unfolded, first, our customers' Microsoft products were affected. This piece directly impacted their Microsoft 365 and Azure desktops.
Customers with the faulty agent installed began to Bluescreen (BSOD, "Blue Screen of Death"), leaving them unable to not just access data, but access anything. For one customer, their entire Domain server went down, leaving their hardware useless as well. Another customer had all servers affected, debilitating them with an across-the-board BSOD to be cleared. Additional customer issues included Datacenter outages which impact business continuity, lost revenue, and reduced productivity. At the root, customers weren't able to operate without a timely fix to their issues.
Cloud solutions like Azure are excellent at two things: data management and data backups. But when your infrastructure goes down, what do you do? Who do you call? Do you have a backup plan in place? Who is managing your system? If you don't know how to answer those questions, you're sunk if another outage ever occurs.
The Solution
- First Microsoft products were affected. This piece directly impacted 365 and Azure. We identified the issue and immediately (Within 5 minutes of the first report) started communications with the affected customers.
- When the customer's entire Domain server went down, Centre immediately began a restore to an uncompromised version of the server and restored systems within 30 minutes of the auto generated ticket.
- Once the major outage was identified (before it was announced by CrowdStrike) we initiated our "After Hours Break Glass" protocol. This emailed and texted key CA resources to an emergency bridge. While on the bridge, we identified all outage issues and began working through the issues.
- The issues were broken into 3 categories: 365 outage, CrowdStrike, and a Datacenter outage . Each outage was managed separately until all customers where verified fully operational.
- We were able to identify and spring major management and resources into action within record time, utilizing automation and proven procedures. Our processes allowed us to stay organized and communitive with our customers.
Learn More
Whether it's a record breaking outage impacting multiple customers of one customer experiencing something that, to them, is a big problem, we don't discriminate. We're committed to making sure that our customers stay online and protected from future problems.
Want to learn more about how to keep your systems backed up and prepared for when disaster strikes? Check out our events page for webinars and in-person events tailored to your needs.
Have questions? Feel free to contact us to get them answered. Talk soon!