In the last several months, Google has suffered a number of well publicized incidents in its cloud-based offerings. The troubles began with a significant security breach and have continued with the past week's networking issues. Each time Google has some kind of incident, a chorus of idiotic blogs equating Google's failings with weaknesses in the cloud computing model.
Saying, "Google goes down, questions about cloud computing arise" (Garett Rogers writing for ZDNet) is like saying "The Detroit Lions go 0-16, questions about football arise". First, these rants commit a basic—and (should be) obvious—logical fallacy called a "hasty generalization": a problem with a single cloud vendor on a single occurrence doesn't say a thing about cloud computing in general.
The real issue lies in expectations. Who actually expects the Lions to play well in any given year? They're a terrible organization from their ownership all the way down to some of the worst on-field talent in the NFL. People don't wonder if the Lions will screw up, they just wonder how the Lions will screw up. Nevertheless, no one thinks Lion mismanagement holds any lessons for what we should expect from the New England Patriots*.
I don't intend for this analogy to suggest Google is a terrible operation with a lack of talent. On the contrary, Google has a significant amount of talent and they do most of the things they promise you quite well. They simply don't promise you much, so you should not be expecting much. Most of us don't pay anything for Google's services and Google makes no service-level promises in relation to those services. In other words, you really should not be surprised that Google's cloud offerings experience downtime.
On the security front, the Lions analogy may be even more appropriate. Google's security track record is about as shameful as the Lions win/loss record. While some level of baseline security should be an expectation of any cloud service, Google has never presented the world with any reason to believe it takes security seriously as an organization. Google provides no transparency into its security practices, has suffered from significant and inexcusable security vulnerabilities, and expresses a lackadaisical attitude towards privacy.
How can you apply the uptime and security failings of an organization that promises nothing to other organizations in cloud computing or cloud computing as a whole? The only lessons we can really learn and apply to cloud computing are:
- You should know what availability expectations your cloud provider is promising and accept the risk that they will barely live up to those expectations.
- Different cloud services deliver different uptime expectations; the cloud does not inherit the uptime expectations of any one vendor
- You should demand transparency with respect to security practices from your cloud vendors and structure the data that you trust to the care of those providers accordingly.
- Different vendors will earn different levels of trust. The cloud does not inherit the trust characteristics of any given vendor.
- You should have disaster recovery and business continuity plans in place to survive the failure of any vendor—especially cloud vendors.
One false lesson from this event is the idea that the impacts of cloud vendor outages are much more significant than (what exactly?). The reality is that the failure of any heavily leveraged service—cloud or not cloud—has impacts that go well beyond your organization should you use that service. If IBM's hosted services go down, that has a huge impact. If GM suddenly declares bankruptcy, that has a huge impact as well. On the other hand, if some small-time cloud provider goes down, most of the Internet won't know it.
I don't intend to suggest you should not use Google's services—my companies all use a number of Google cloud services. Google is not misleading anyone. They provide great services for little or no cost. If you can put up with occasional downtime and structure what you put in their systems with the basic assumption that those systems are high-risk computing environments, then I highly recommend Google.
On the other hand, you should not be sticking personally identifiable information on Google servers and you should not be using Google for anything that needs 99.9% or higher availability.
* You can't even tell anything about an individual team from an individual loss. Similarly, no one should assume anything about Google's ability to maintain a high-availability infrastructure from a single networking issue.