Thoughts and ideas: The Economics of the Cloud

Two sides of a story

"We're gonna save lots of money in the cloud!".

Well, there you have it. If you want to save money (and who doesn't) and you have a one or more IT applications that can live in a cloudy world, then this is the way to go, isn't it? After all, if you look at the costs of (for instance Amazon Web Services) you can have a server instance for as little as $0.02 per hour. 2 pennies! Who could ever compete with that?

Of course this sounds very attractive but reality is usually a bit less rosy.

"Wow, this Amazon service is really expensive!"

This is a not unusual reaction you get after running an application for a few months in the cloud and the actual costs are becoming more visible. Running a full-fledged system 24/7 in the cloud is certainly not for free, and the costs associated with it are significant. That might come as a nasty surprise when the first bill comes in.

In my view, both extremes stem from a lack of insight in:

what exactly is needed to support a cloud-based application and/or

a lack of insight in the total cost of ownership for on-premise solutions.

Below, I will provide a few pointers (non-exhaustive) that might be helpful in comparing the costs of both directions.

Context

Before trying to answer these questions, first some context. Cloud computing is a very broad term and is applied to a wide variety of services. To keep things simple we focus on a specific type of cloud services, the type where you can rent computing capacity and storage usually known as Infrastructure as a Service. This is the type of Cloud computing with the lowest abstraction level, meaning you have to manage most of the stack yourself.

The good thing about IaaS though, and that is why this makes it the most popular option on the market today, is that it is very flexible and little migration is needed. Basically, what you run on-premise, you can typically run this in an IaaS cloud as well (at least, from a technical perspective).

So what about this rosy picture?

When moving to the cloud it is easy to get blinded by these stunning advertised costs, for instance $0.02 per hour for a Micro server instance. This is however just part of the story.

Nothing is for free

Taking Amazon as an example, it soon becomes evident that (almost) nothing is for free. You need storage? Pay for it. Use your storage? Here's the bill. Backup? This is what it costs. You want to restore your backup? Pay for it! Monitoring alerts? Well, you get the point.

All these individual cost components are priced very reasonably but still they add up and makes things significantly more expensive then you initially thought when you read about these 2 cents per hour.

So: understanding all cost components are needed for making a valid comparison.

A server instance is not a server

Probably one of the least understood things within AWS is how to compare these server instances to its real-life, physical counterparts. For example, AWS advertises the CPU capacity using the ECU, which is a fictive unit that is comparable with a real CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. Note, a 2007 unit, so hardly state of the art. IO capacity of the instance is even more obscure as this is vaguely indicated using as 'Moderate' or 'High' IO performance, without mentioning the actual bandwidth that comes with it.

In addition, you need to keep in mind that there the fact that a cloud platform is inherently multi-tenant might have some negative impact on the performance of your own server instances. This can be countered by allocating large chunks of capacity (rather than multiple smaller ones) or in case of AWS even request dedicated hardware but obviously this will have a cost impact.

The bottom line is that when you launch a server instance, you might not get the capacity you were anticipating, requiring you to upgrade or launch additional instances.

So: make sure you understand the actual capacity provided in comparison with physical servers.

Elasticity might be difficult to achieve

The single most cited benefit of cloud computing is the ability to size the capacity of your systems (regardless whether this is automatic or not) according to the demand. In contrast, traditional systems are often sized for the maximum peak in anticipated demand, resulting in dramatically under-utilized systems. Server virtualisation already started addressing this, and cloud computing is the supposed to be top-of-class in this regard. With dramatic cost savings as a result. Right?

Well, yes, it could be. Automatic scaling is a very useful feature but it might be difficult to fully exploit it. It comes with its own set of challenges, such as:

Is your application capable of dynamically spreading the load over multiple servers. For a straightforward web server this is typically not too much of an issue but more complex applications or servers holding state (e.g. databases) things usually get more complicated.
How do you provision these spontaneously launching server instances?
How much time is need to spin up a new server instance and configure it so that it can actually take part in processing the workload. Does this time needed to scale up match the actual peaks in the demand?
How do you keep a grip on the instances that are actually running? An error in the provisioning might result in server instances being launched while they never become active part of the system.
Auto-scaling is best served by relatively small server instances, however as discussed before these smaller instances come with their own drawbacks as well. A trade-off is needed.

So: make sure that you assess the applicability of auto scaling before reserving the cost savings.

Allocation vs. Usage

Cloud computing is typically associated with the Pay as you Go paradigm. You pay for something when you need and use it.

Not so fast. This applies to quite a few things but unfortunately not in all cases. For example online (block) storage is typically allocated beforehand and you really pay for what you allocate, and not for what you actually use. Another example is the use of reserved instances, which allows you to buy reserved and discounted capacity for a one or three year period. The more you pay now, the greater discount you get. However for the Heavy Uitlisation reserved instances you are charged for every hour of the month, regardless of the state of the server instance.

So: it is really necessary to understand the extent in which the Pay as you Go paradigm applies to the different cost components.

So that settles it, the cloud is way too expensive

Hold on, this is not the message I'm trying to convey. I am actually a strong believer in cloud computing and I sincerely believe this is a massive paradigm shift we are witnessing. Does this means it is applicable to all use cases? Of course not. And does it also mean that computing costs are all of a sudden a fraction of what they used to be? No way.

However, when this unexpectedly high bill is coming in at the end of the month, it is easy forget about the service that has been delivered and difficult to compare apples and pears.

What is the total cost of ownership of the alternative then?

As mentioned before, almost everything comes at a cost at Amazon, which might be considered as a blessing in disguise. In the end, Amazon is providing the service to make some money and being one of the largest IT infrastructure exploiters in the world you can expect them to have a good understanding of the cost components of such an infrastructure.

By charging you these individual line items, it does provide you with an insight about the total cost of ownership (TCO) which you might not have had before. And as long as it is not clear what the TCO of the alternatives are, you cannot state that the one of the these is too expensive.

In my view, there are plenty of use cases where an on-premise or co-located solution is more economical but the differences won't be spectacular and you surely have to have a very well-organised IT organisation to achieve these benefits.

So: ensure you understand the alternative's TCO when comparing it with a cloud based solution.

Apples and Apples?

It is an, among IT pros, popular pastime to compare the costs of an off-the-shelf server with what Amazon is charging you for the equivalent of that. And boy, what does Amazon suffer then.

Except that this is not a valid comparison. To start with it usually doesn't end with this server alone. You need additional infrastructure such as storage, backup equipment and so on. The server must be housed, cooled, mounted in a rack, power must be supplied, physical installation is needed.

And what happens when this server dies (which will happen)? Well, at best a spare server is available or a sufficient service contact is in place, but even in these cases more often than not it takes a significant amount of time before the replacement is ready to go. How much is it worth then that in a cloud computing use case that replacement can be fired up (even fully automatically if needed) within minutes? And that the solution can be migrated to a disaster recovery sites without huge upfront costs and without intervention of the cloud provider?

Basically you are comparing a service with a piece of equipment, which is at best only a part of the solution. Apples and pears.

So: calculate the cost of the on-premise IT service rather than a piece of equipment when comparing costs.

Benefits of capacity on demand.

As discussed before, automatic scaling of the solution's capacity to meet the demand might be more complicated than it sounds. That's very true, but it still doesn't mean that it has no value. It certainly does, big time! A system designed for elasticity used for fluctuating load profiles is able to save significant costs, no doubt about it. In short: the more spiky the demand, the better it suits in a cloud use case.

But there are many more use cases where the capacity of demand proves to be a real winner. What about setting up a temporary test system? Running a load test on a representative system setup? Testing a disaster recovery procedure? Scaling up capacity as there is a large data migration going on which normally would last for days or even weeks? And very often for very little costs as this typically runs for hours or weeks rather than months or years.

It is evident that this flexibility is extremely useful and more often than not quite a few of these things simply wouldn't be possible with on-premise systems.

So: try to calculate the value of this flexibility into account before making up your mind.

OPEX vs. CAPEX

While talking about costs it is tempting to easily compare the sum of the monthly costs over a three year period with a scenario having a large initial investment in combination with smaller monthly costs. Except that this is not the same. Every financial specialist (which I am not) will be able to explain the benefits of Operational Expenses (OPEX) vs. Capital Expenses. It is very attractive to spread the payment of a large sum of money over three years rather than paying the majority of that money upfront, a principle firmly exploited by credit card firms.

So: ensure to take capital costs into account.

The bottom line

On purpose I tried to focus on the economical aspects of cloud computing vs. more traditional alternatives, while assuming a bit of a Devil's advocate role. But economy is just one part of the equation. Rather than managing your own data centres with dedicated equipment, cloud computing allows for more focus on your core business. From that perspective, whether or not utilising cloud computing is much more a strategic choice rather than something just based on numbers.

Sometimes it works, sometimes it certainly does not but in all cases a thorough deep understanding of both sides of the story is needed to make a qualified decision.

A few references

Googling for economic benefits of the clouds will result in a huge number of hits, but a few of them I found very interesting. One article that raised quite a stir was the AWS vs. Self-Hosted article, including the response from Amazon's Jeff Bar. Another interesting, although less quantified is the article that comes with the intriguing name Is cloud computing really cheaper? Finally, a nice interactive spreadsheet that aims at comparing Cloud vs. Colo cost is worth to have a look at.

Thoughts and ideas

Wednesday, September 5, 2012

The Economics of the Cloud