But What Exactly "Is" Cloud Computing?

By Kurt Cagle
December 17, 2008 | Comments: 7

If buzzwords didn't exist, the computer industry as we know it would collapse. Really! For instance, here's a quick pop-quiz -

1. Define Cloud Computing in twenty five words or less. Please show all work.

Er ... um ... it's well, it has to do with building virtual computers to host virtual services and support virtual communities while passing virtual messages to virtual ... um .... give me a second ... there's got to be a virtual something here.

Put another way, cloud computing is all virtual - it doesn't really exist!

Okay, so maybe this isn't the best position to take when covering cloud computing, but it does in fact provide a good starting point for understanding what cloud computing is and isn't. There are in fact two good working definitions - a very narrow one, and a much broader one. The narrow one first:

Cloud computing is grid computing, the use of a distributed network of servers, each working in parallel, to accomplish a specific task. As an acquaintance of mine put it, if it isn't using MapReduce, it probably isn't a cloud.

Of course, if we were to deal with this strict definition, then all the hype about "cloud computing" and the opportunities for companies to hawk their wares as "cloud friendly" simply wouldn't exist ... and where would the fun be in that? This is especially true given that there simply aren't that many problems even at the large enterprise level that require the use of "slow" massively parallel processing (i.e., processing distributed over networks that have a slow latency compared to processor speed).

The Era of Distributed Virtualization

Selling massive economic simulations would probably not find much of a market at this point in time, weather simulations are realistically only feasible if the grid is relatively self-contained. Hmmm ... you can process deep space satellite programs over the grid, of course, perhaps unfold a few proteins here and there, but chances are pretty good that most businesses just don't have the problems that make grid computing that attractive. So on to the broader definition:

Cloud computing is the distributed virtualization of an organization's computing infrastructure.

Now this is good market-speak - vague enough to have almost any possible meaning, with lots of multisyllabic words that sound really impressive on a power point slide (and you have to love the way that "distributed", "virtualization" and "infrastructure" got so casually tossed out there).

However, while this is perhaps a bit too broad as a working definition, it does in fact point to what seems to be emerging as the next major "platform". If you talked about cloud computing as distributed virtualization, you're actually getting pretty warm to a workable definition.

Much of the work of the last decade has been involved with moving from centralized architectures to distributed ones. Centralized architectures, such as the famed "client-server" relationship of yore, involved a spoke and hope arrangement, where multiple clients connected to a single server, and each server in turn communicated to more powerful "routers". Most applications weren't truly distributed ... instead, they existed as virtual server sessions within the server itself, with just enough state pushed to the client to handle very minimal customizations.

Put another way - the applications stopped at the server boundary.

Eventually, however, it became obvious that it was not that efficient to store your data on the same machine that handled the application logic. This translated into the first "distributed" applications, in which data was kept within a separate "data tier" in a different box, and the data access then occurred through an abstraction layer between the data tier and the logic tier. Client-server became three-tier, with messaging becoming an increasingly important part of the overall process.

Three tier rapidly became n-tier as different services began entering into the mix. One consequence of this shift to n-tier application development is that the messaging architecture continues to build in importance compared to the actual services being deployed - and the standardization of messaging in turn provides a powerful tool for services to simplify their underlying public interfaces to best work with that messaging format. Put another way, as messaging becomes more uniform, the services interfaces tend to become simpler in order to best work with these messaging formats - interfaces become abstract (or virtual).

From Virtual Machines to Commodity Computing

On the physical front, virtual machines have been developing in parallel to this messaging architecture. The concept of a virtual machine has been around for a while - build a "fake" machine that takes a specific set of commands from the applications that run on top of this, then convert these commands into instructions that the underlying machine can use. These became considerably more sophisticated, with applications like VMWare able to run one operating system within another.

The VMWare model is significant in a number of respects - by providing networking access and a virtualized driver model, then allocating a hard driver space as a virtual partition, VMWare was able to not only let someone create a virtual system, but also enabled the ability to take a "snapshot" of that system at any given time that could be saved and then run again at some later point. This meant that an application developer could effectively clone a "template" snapshot of a given application and distribute that as a virtual instance - you could literally have a completely functional, fully enabled system up and running in under a minute.

Other companies and projects took a different approach to machine virtualization. In essence, this approach involved building the virtualization capability into the host operating system directly (rather than run it as a secondary application) meaning that you could start with a "bare-bones" operating system then bring multiple machines online on the same piece of hardware.

In this case, the bare-bones system became known as a hypervisor (the next step up from a supervisor, presumably), specifically Type I Hypervisors. The XEN project uses a hypervisor approach, as does Microsoft Server Hyper-V system. VMWare type approaches, on the other hand, are typically considered Type II Hypervisors, because they run as applications within an advanced host operating system, rather than as a stand-alone operating system running in tandem with other stand-alone operating systems.

These systems were originally developed as conveniences for developers, letting them work on multiple systems simultaneously, but the whole hypervisor concept has taken off dramatically in the cloud-computing space. Typically how this works is that a company with spare processing capability sets up multiple large machines that might have many terabytes of storage, hundreds or even thousands of CPUs and hundreds of gigabytes of RAM.

These systems then use hypervisors to partition these meta-systems into distinct virtual machines that can be configured to any size or power. Unlike physical machines, such virtual machines can additionally be powered on or off without actually shutting down the actual server, and they can moreover have more memory or processing capability added simply by changing a configuration file.

There are downsides to this approach, of course. For starters, any hypervisored OS inherently is running two operating systems, even if one is only minimal, and the abstraction layer takes a certain number of cycles away from the actual processing - which means that a blazingly fast sea of processors still will produce only a moderately fast virtual machine. Additionally, bandwidth becomes a considerably more constrained resource, which makes hypervisored servers reasonable for doing web hosting, but fairly abysmal (and expensive) for hosting video and similar band-width intensive media.

A number of companies have, within the last couple of years, created "cloud computing centers" that take advantage of hypervisors and Storage Area Networks (SANs) in order to create hosted environments where business can effectively duplicate (or replace) much of their existing IT infrastructure. It can be argued that, by the narrower definition above, this is not technically cloud computing, as in many cases the virtual systems are in fact working simply as fairly distinct web servers, but this is the point where the marketing hype transcends the literal definition by just a bit.

Amazon was the first to really make the "cloud computing" model work, effectively, with the creation of the Amazon Elastic Compute Cloud (EC2) system. These use virtualization and a publicly available API in order to make it possible to bring one or a hundred virtual computers up simultaneously. This is complemented with their Simple Storage Service (S3), which effectively provides SANs for data storage. Their model is competitive (if a bit on the high side for some applications).

Microsoft also entered into this space recently with Windows Azure, which provides similar virtual Windows systems, along with a full complement of tools for building large scale distributed applications in this space. Sun, has effectively "re-entered" this space - their first efforts in cloud computing, the Sun Grid, attracted a fair number of customers but was somewhat ahead of its time, and as a consequence they have recently been re-promoting their own cloud credentials.

It should also be noted that many of the big hosting services have not been napping as cloud computing has caught fire. Voxel CEO Zachary Smith noted, in an interview with O'Reilly Media earlier this year, that companies such as Voxel, GoDaddy, and other large scale hosting services have been providing virtual servers at a much lower price point than their dedicated servers for a couple of years now.

Moreover, he is pushing strongly to get an industry-wide agreement on a common standardized API for creating server instances programmatically, possibly using the Amazon EC2 APIs as a model. In order for true commodity computing to come of age, a common industry standard will definitely need to emerge.

Cloud Computing Is Services Computing

You may have noticed the preponderance of the word "service" in the last section. This is not a curious coincidence. The upshot of virtualization is that you are effectively creating an abstraction layer for the hardware - in essence turning that hardware into software that is accessible through a programmable interface, almost invariably using the Internet as the messaging infrastructure.

There is a tendency in Cloud Computing to focus on the hardware, but ultimately, cloud computing is in fact the next stage in the evolution of services that has been ongoing for at least the last decade. The concept of Software as a Service (SAAS) is gaining currency at the small and medium sized business (SMB) level especially, where the advantages of maintaining an internal IT department is often outweighed by the costs. As the economic situation continues to deteriorate, SAAS is likely to become increasingly common moving up the business pyramid.

In a SAAS deployment, those applications that had traditionally been desktop office "productivity" applications - word processors, spreadsheets, slide presentation software and the like - are increasingly becoming web-based, though many are also increasing their foothold in the "offline" capability support that contemporary browsers are beginning to support, either built in or through the use of components such as Google Gears. Google Apps provides a compelling example of a SAAS suite, combining sophisticated word processing, spreadsheet and presentation software into a single web suite. Zoho offers similar (and arguably superior) capability.

Microsoft has recently debuted Microsoft Office Live Workspace, which effectively provides a workspace for working with common documents online, but raises the question of whether it is in fact a true cloud application as it still effectively requires a standalone version of Microsoft Office to edit these documents.

Salesforce.com has often been described as being a good cloud computing application, though its worth noting here that this application also shows the effects that cloud development has on applications. The Salesforce application feels very much like a rich CMS application (similar to Microsoft Sharepoint or Drupal, which also have cloud-like characteristics) dealing with complex dedicated document types.

Cloud Computing and RESTful Services

This concentration on document types itself seems to be an emergent quality of the cloud. Distributed computing really doesn't tend to handle objects all that well - the object oriented model tends to break down because imperative control (intent) is difficult to transmit across nodes in a network.

This is part of the reason why SOAP-based services, which work reasonably well for closed, localized networks (such as within the financial sector) as a remote-procedure call mechanism, don't seem to have taken off as much as they reasonably should have on the web. In general, distributed systems seem to work best when what is being transmitted is sent in complete, self-contained chunks ... otherwise known as documents, and when the primary operations used are database-like CRUD operations (Create, Read, Update and Delete).

This type of architecture (called a REST architecture, for Representational State Transfer) is very much typified by the way that resources are sent and retrieved over the web, and effectively treats the web as an addressible database where collections of resources are key to working with the web.

A new, emerging model of cloud computing as a consequence is the RESTful Services Model, in which complete state is transferred from point to point within the network via documents while ancillary operations are accomplished through the use of messaging queues that take these documents and process them asynchronously to the transmission mechanism.

The SOAP/WSDL model is one that has taken off especially for financial and intra-enterprise clouds, though here the SOAP wrapper is used not as a flag to trigger specific tasks by the receiving system but as an envelope for queue processing (indeed, the RPC model that many early SOAP/WSDL proponents pushed has been largely abandoned as being too fragile for use over the Internet). Service Oriented Architectures (SOAs) describe the combination of SOAP messages and node-oriented services, typically with a bias towards intentional systems (systems where the sender determines the intent of the message, rather than the receiver).

A second model comes in the use of JSON - a representation of a JavaScript object as a mechanism for transferring state. This model works very effectively in conjunction with web mashups, though its over-simplicity of structure and lack of support for unicode (among other factors) makes it less than perfect for the transmission of semi-structured documents.

The third RESTful model is the use of syndication formats, such as Atom or RSS, as mechanisms for transmitting content, links and summaries of external web resources. Because syndication formats are in fact very closely tied to publishing operations, syndication formats tend to be fairly ideal for RESTful Services in particular.

One of the most powerful expressions of such RESTful Services is the combination of XQuery/REST/XForms (or XRX), in which you have a data abstraction model (XQuery) pulling and manipulating data from other data sources such as XML or SQL databases, a publishing (RESTful) layer and syndication format for encoding data (or data links) such as Atom and its publishing protocol AtomPub, and a declarative mechanism for displaying or editing whole documents on the client (XForms being the most well known, though not the only solution).

While this particular technology is still emerging, already vendors and project developers are working on building integrated solutions. Tools such as MarkLogic Server, the eXist XML Database, EMC/Documentum's X-Hive XML Server, Orbeon Forms, Just Systems' xfy system as well as similar by Microsoft, IBM, Oracle and others in the syndication space attest to the increasing awareness and potential for XRX-oriented applications.

The Edge of the Cloud

One of the more interesting facets about clouds is the fact that the closer you get to them, the harder it is to determine their edges. This is one thing that physical clouds share with their digital analogs - the edge of a virtual cloud is remarkably ambiguous. It's typical in a network diagram to use the convention that the edges of such clouds are web clients - browsers, web-enabled applications, in essence, anything that sports a browsable screen, is used by humans, and most importantly doesn't actually contribute material to the state of a given application.

However, this definition is remarkably facile. Consider an application such a Curl, which really has no real GUI, but is quite frequently found referenced by other applications. Or perhaps you could think of most contemporary browsers that support (or will soon support) offline storage. Both client and server have web addresses (though admittedly DHCP can complicate this somewhat), and certain web clients (typically physical devices) actually have built in absolute IP addresses - they can act both as clients and servers.

Put another way, the notion of web client and web server is slowly giving way to web nodes. Such a node may act as a client, a server, a transit point or all three. This is now increasingly true as AJAX-based web applications become the norm. What this means in practice is that in cloud computing, there really are no edges, but rather a fractal envelope that describes the stage where you have no further connection points - in this case, think of the overall outline (or envelope) of a tree - while individual branches may end within the envelope or be touching the envelope, none extend beyond it.

Is web programming part of cloud computing? Only in very abstract terms - generally, either when you're refreshing the overall state of a given document of content or when you're updating that state through XMLHttpRequest or other peer-to-peer communication protocols. It's more fair to say that most computer languages will eventually incorporate (through libraries or by themselves) cloud computing components ... indeed, most already do.

Languages such as Erlang have specifically evolved for use in asynchronous, multiprocessor, distributed environments that look suspiciously like clouds, while the MapReduce framework written by Google is intended to handle the processing of large amounts of data over clusters of computers simultaneously (which also highlights that while Google does not (yet) have a formal publicly or proprietary cloud, they have been laying the foundation for much of what is emerging as cloud computing within their own search intensive operations).

In a sense, cloud computing is an architectural concept, rather than a programming one per se. For instance, its probably fair to say that bit torrents, which use a peer-to-peer architecture for transmitting pieces of a given resource from multiple sources, represents a fairly typical cloud computing application - asynchronous, massively parallel, distributed, RESTful (torrents are not concerned with the content of the pieces, only their underlying existence) and virtual (the resource does not actually exist as a single entity, but has reality only in potential as many packets, some of which may be duplicates, and some of which may no longer actually exist on the web).

Clouds on the Horizon

It's interesting to note that this also leads full circle back to grid computing. Grid computing had its origins in applications such as SETI Online, which used the free cycles of participating PCs in order to analyze signals from radio telescopes to attempt to find apparently artificial, non-random signal patterns that may have indicated intelligent life.

Ironically, such use of free cycles has never really taken off beyond very specialized applications, largely because of the very real concerns for security. Cloud computing is far more likely to continue evolving, for at least some time, within massive proprietary or dedicated public clouds, rather than ad hoc networks, at least until a way can be found to monetize such ad hoc networks.

Overall, however, the future of cloud computing actually looks quite bright, perhaps because of the very storm clouds that have gathered on the economic horizon. Cloud computing provides ways to reduce the overhead of a formal IT department within a small to medium sized organization ... especially one for which IT is a significant expense.

For instance, a school district may choose to use virtual machines in setting up web sites, centralizing grades and reporting, host distance-learning systems, and so forth, and save not only on the need to physically maintain machines and bandwidth but also to add or remove servers as needed to reflect their demand.

Beyond the immediate advantage of reducing physical hardware cloud computing also has the added advantage of reducing the environmental costs associated with maintaining that infrastructure, along with the power costs.

For instance, the IT manager for a Postal District in Tacoma, Washington laid out to me one of the central problems with their growing IT usage - the building which housed the servers was not designed to handle the heat and electrical load of more than eighty servers, and they had reached a stage where they were seriously looking for better facilities. Instead, they began, server by server, to move non-critical servers to virtual counterparts using a hosted service provider. They kept the most critical servers local, but they were able to reduce their physical server needs by nearly 60%, and were able to put off looking for new facilities for the foreseeable future.

This does point out that, as with any IT strategy, migrating to virtual servers on the cloud makes more sense for non-mission critical functions, and any such strategy should also look at recovery and response time when outages do occur. The danger, of course, is that failure of a cloud center could have disproportionately bad economic effects. On the other hand, this is true of any large-scale IT deployment, and typically, because of these considerations, cloud centers are far more likely to have multiple redundancies in terms of power and backup in place such that if a failure does happen, the losses will be minimal.

This also applies to the ability of such centers to handle the environmental impacts of running such virtual IT centers. Virtual computers use considerably less energy per CPU cycle than physical ones do (most virtual computers actually are very efficient in terms of memory and processing allocation, because much of that is handled in RAM rather than in far more expensive disk access operations). Moreover facilities that host such systems are specifically designed for handling large numbers of servers running simultaneously by introducing much more efficient cooling and power draw systems than tends to be found in most IT departments.

This means that by virtualizing the least mission critical parts of your IT infrastructure on the web, you also can provide significant savings in terms of cooling systems, electrical infrastructures and facilities management that all translate to the bottom line.

This virtual world of cloud computing does, in fact, have some significant impacts on the real world ... and will have more as businesses become more comfortable with moving their services and their infrastructure into the cloud, as technologies for dealing with cloud computing improve and as standards and methodologies for developing for this new computing environment solidify.

What this means, of course, is that this particular cloud has a practically gold lining - and will chase the storms away.

Kurt Cagle is an author, developer, and online editor for O'Reilly Media, living in Victoria, BC, Canada. You can subscribe to his published articles here or follow him on Twitter.

You might also be interested in:


What can I say, Monsieur Kurt except that you're such a geek!

I'm not sure that you have helped turn hype and buzzword into something fairly understandable but may just have succeeded adding complexity instead of defining cloud computing for mere mortals. No new non-core IT business will dare try -aaS upon reading this. They will just buy proprietary licensed software and continue to be duped instead of understanding that there exists an alternative that would cost less. Naughty you!

I expect another article from you that will explain cloud computing as if we're all kids who have just been given OLPC gadgets. Only till then can we claim that clouds aren't just hype :)

Morph Labs

PS: Thoughts about interoperability? (Hope I get the Captcha right...) Happy Holidays!

Umm. The R in CRUD means Read. Basic error.


One of the perils of being an editor is that periodically your own stuff goes out with being properly proofread. Thanks for the catch.

- Kurt

what would the consequences and opportunities be of cloud computing for large and well known software developers such as Microsoft and Adobe if people no longer had to purchase software but just use it as a service?

It’s easy to see why cloud computing - in all its forms - continues to gain momentum. Users can work anywhere and, if your computer breaks or your laptop is stolen, you won’t lose any data. We have just published an article on cloud computing here: http://www.zeta.net/blog/2009/01/what-cloud-computing-means-for-you/

I disagree with one of your points that Vmware virtualization requires a OS to run on. In fact, VMware ESX is bare metal hypervisor - probably more bare metal than Citrix Xen and hyperV at this point of time :)

Great article, many great points. Many companies are switching over to the virtualization realm for security reason but in the same breath many are conserned with virtual security. Many cloud computing firms are also beginning to use LeadLifter for lead generation LeadLifter's B2B sales conversion system works wonders for increasing ROI for complex sales. The self service quote software can increase ROI by 200% based on other client experiences. Primarily built for complex sales within the tech industry for storage solutions and products for mid-sized companies.

News Topics

Recommended for You

Got a Question?