Home » Blog » Lesson learned from the Cloudflare massive outage

Lesson learned from the Cloudflare massive outage

Lesson learned from the Cloudflare massive outage

Cloudflare is a cloud solution giant, helping individuals and companies manage over 3 million web properties through its comprehensive product line. 

On July 2nd, everything is peaceful until you see a “502 Bad Gateway” when trying to access your favorite website or trying to start your day at work attempting to access the web app that’s part of your daily workflow.

Cloudflare said it has been experiencing some big outages, causing many sites (big ones) paralyzed and unreactive to visitors. They have acknowledged the widespread issue and were working on a fix immediately after the outbreak.

Mlytics platform detected the Cloudflare outage on the analytics dashboard

It turns out that the issue was caused by a “bad software deployment”. All services went back to normal once the deployment was rolled back.

The AWS déjà vu

Back in 2017, we had similar issues and the trouble maker was AWS. As you probably know, most websites and web apps host their services on AWS. The AWS S3 went offline for 4 hours and it surely did cause a huge panic for all netizens.

image from http://blog.catchpoint.com/2017/03/01/aws-s3-outage-impact/

We’ve gone through this before, and yet here we are again. When we choose our cloud solution provider, we often ignore the importance of redundancy. It’s not because it’s not important, it’s because we assume the big cloud solution provider we use is too big to fail. And thus, companies without proper redundancy measures pays the price.

It seems we haven’t learned.

Just multi-everything

“Multi-cloud” is a buzz word in recent years, but it’s not yet a mainstream topic in the industry yet. However, when it comes to availability management, it is considered to be one of the best practices out there. We’ve covered this a while ago, we recommend check out the blog post. 

The idea refers to a mix of multiple public infrastructures as a service (IaaS) environments, such as Microsoft Azure, Amazon Web Services, or Google Cloud, all operate as part of a single heterogeneous architecture (a.k.a. Polynimbus cloud strategy). With this model, you no longer rely on just one cloud environment to remain operational.

At Mlytics, we have been developing the perfect Multi CDN tech for years. The concept is that we combine multiple CDNs and let our users use them like they’re using one. We built a platform with an AI system that can help our users switch to the best CDN for their website and avoid outages automatically.

Our platform has helped our users survived many internet outages, including the Cloudflare software glitch disaster happened today (July 2nd, 2019).