Thoughts and ideas: 2012

Monday, December 24, 2012

Thoughts on Cloud Management solutions - Cloudformation

I have been lucky to do some serious work with several cloud management solutions in the last couple of years. I thought it would be useful to put my thoughts on paper and this is the second of a few posts.

In the previous post, I shared my two cents on the added value of Rightscale, while in a future post I will also discuss Opscode's Chef. But now, Amazon's own Cloudformation.

Wait, we were supposed to talk about Cloud management solutions, right? And now we are going to discuss one of Amazon's features, it itself part of a much broader management solution. Yep, that's right. And I agree, Cloudformation is not a full blown cloud management solution in itself but it is a pretty darn useful component and allows you to do some powerful stuff.

So, what is then. Well, obviously Cloudformation is AWS specific and has nothing to offer in the multi-cloud area, like Rightscale does.

Amazon Web Services has grown over time to an amazing set of Cloud services, some of them overlapping, others complementary and launching a solution in AWS typically takes a few of these services to work in concert with each other. Simply launching a cloud based server typically requires a significant set of resources such as the server instance itself, storage volumes, security groups, DNS records, alarms, possibly load balancers and auto scaling configurations and so on.

Manually configuration this kind of things quickly becomes very boring, and that is where Cloudformation kicks in. It allows you to declare the kind of resources you want (if not clear enough: in a declarative way, which makes a big difference), and create this as a fully managed stack either through a web interface or command line.

An updating simply requires updating the template and applying it to an existing stack. A snippet of a security group declared in such a cloud formation template is shown below.


"WebServerSecurityGroup": {
  "Type": "AWS::EC2::SecurityGroup",
  "Properties": {
    "GroupDescription": "Security Group for the web server instances",
    "SecurityGroupIngress": [
 {
 "IpProtocol": "tcp",
 "FromPort": "80",
 "ToPort": "80",
 "CidrIp": "0.0.0.0/0"
 },
 {
 "IpProtocol": "tcp",
 "FromPort": "80",
 "ToPort": "80",
 "SourceSecurityGroupName": "amazon-elb-sg",
 "SourceSecurityGroupOwnerId": "amazon-elb"
 },
 {
 "IpProtocol": "tcp",
 "FromPort": "22",
 "ToPort": "22",
 "CidrIp": "0.0.0.0/0"
 }
    ]
  }

}

The very, very useful thing is that it allows you to treat your IaaS configuration as code, and that you don't have to deal with state. So no checking if resources already exist and based on that defining the next steps: this is taken care for you under the hoods.

Note that you have to realise that this still have to take place. Changing a resource, for instance the instance type of a server instance will result in a stopped instance (or terminated, in case of an instance store instance) and started again with the new instance type.

At best, Cloudformation is part of a full blown cloud management solution. Cloudformation focus is on describing the cloud resources you need, not on the configuration on the server instances itself. That said however, Cloudformation has some tooling to configure these servers as well, a bit of a (very) light weight Chef or Puppet kind of thing or possibly more comparable Ubuntu's cloud-init. This is useful for some not too large systems, but is typically used to bootstrap these servers with agents that take ownership for the further provisioning of these servers.

I really recommend that in case you want to use the AWS platform to take a better look at Cloudformation. It has a bit of a learning curve but it's definitely worth the investment.

There are a few things that need attention though:

The service is not bullet proof yet. Sometimes it throws exceptions that disappear after a few hours again and in (very) rare occasions these cloudformation stacks end up in an error state that leaves you no other option that deleting the entire stack and start all over again. Ouch!
Cloudformation is declarative by design and has only very limited conditional logic support. This quickly leads to very long templates with quite a bit of duplication of code. There is support for included external scripts in your stack, but in practice this doesn't work too well. In my view the best way to use cloudformation script is to wrap it in a simple generator that allows you to minimise code duplication.

Friday, December 21, 2012

Thoughts on Cloud management solutions - Rightscale

I have been lucky to do some serious work with several cloud management solutions in the last couple of years. I thought it would be useful to put my thoughts on paper and this will be the first of a few posts. In the next posts I will also talk about Cloudformation and Chef.

A few years ago I started working with Rightscale, on top of Amazon Web Services. The main selling point (at least, from my perspective) of Rightscale is the ability to be multi-cloud. AWS is obviously supported but also Rackspace and more recently players such as Azure and Google's Compute Engine are part of the deal.

The nice thing about Rightscale is that it provides ready to use, fully configured server templates, which can be configured by attaching scripts or cookbooks to it, and passing along the right parameters. Typically they operate on top of bare server images, which are available in the different clouds and configuration is done upon boot time.

I really like that model, it provides tremendous flexibility and by having a rich set of pre-defined server templates Rightscale allows you to get started very quickly.

The downside is that it mainly uses scripting (bash, ruby, powershell and sometimes chef) which is not so easy to maintain. Also the development environment to create, deploy and test these scripts are far from user friendly, which results in a relatively cumbersome experience.

The multi-cloud thing then, is excellent. That is: if you need it. If you really need to support multiple clouds, this is the way to go, but in my view most customers are perfectly served by sticking to one public cloud vendor. Regardless which one you choose, I don't believe you'll gain a lot by hopping from one vendor to another. And, glad you mentioned it, disaster recovery can be achieved using multiple (for instance AWS) regions, as they are fully decoupled by nature and don't require throwing another vendor in the mix. But if you are a product vendor, serving customers with different preferences you probably have no choice.

In case you don't need this multi-cloud thing it might really get in your way. In the end, it restricts you to the largest common denominator and since this market is so much in flux (almost 100 product announcements in 2012 for AWS) this is not the most appealing model.

Note that Rightscale does allow you to use cloud specific features (which you really need for deploying a realistic application), but then the question pops up: why not use the AWS management console straight away. These native consoles and APIs are almost always more feature rich as well.

For example, mid 2011 Amazon launched the full-blown VPC functionality as we currently know it, and we were very tempted to use it. However in order to use it we had to upgrade our Rightscale license and even then we would have access to a very limited form of VPC functionality. We couldn't use the AWS functionality itself, as RS (at that time) didn't allow us to launch a server in a particular subnet. So basically we were significantly lagging behind, which wouldn't have been the case if we would use AWS directly.

So you really have to decide whether the multi-cloud feature makes it worth sacrificing some of the flexibility you have when using the native functionality itself. In the end, it is all about portability and how much it is worth to you. It simply depends on your needs.

Thursday, November 29, 2012

Never underestimate the appeal of Microsoft to Enterprises

I have been lucky to be able to work in very diverse environments, from small, highly innovative startups to large enterprises.

And while it is absolutely true that, from a technical perspective, open source solutions rule the internet, Microsoft has a very strong foothold in the corporate world.

OS technologies such as Linux, Ruby, Python, Scala, NoSQL are the foundation of almost all of these internet services and it is surprising to see how the Windows platform is almost considered as an afterthought (if considered at all), when going through technical documentation and so on. It simply is not taken into consideration, and in my view for clear, justifiable reasons.

However, the corporate world is a different beast and ruled by Microsoft. Of course, OS solutions do exist and technologies such as Java are widespread here as well. But you'll be very hard pressed to find an organisation without a Microsoft presence whatsoever.

And that gives Microsoft a (surprising, at least for me) advantage in other areas as well. When discussing potential cloud solutions I was expecting that Amazon Web Services is considered the benchmark in the IaaS area. I am not trying to say that AWS is by default the desired option, but I was expecting that their service would be recognised as the pace setter and some kind of benchmark. And Microsoft technology is certainly a first class citizen at AWS.

I was wrong. The strong relationship MS has built with these enterprises also leaves the impression that the cloud services provided by Microsoft are also some kind of safe harbour for these enterprises exploring cloud based solutions. This is not necessarily based on an objective evaluation of services, costs and service level, it is perception.

And as I have learnt a long time ago: it is the perception that matters!

Monday, November 19, 2012

Automatic, unattended install of phpmyadmin

In this brave new world of infrastructure automation, being able to install a package without manual intervention is a bare necessity.

And, how difficult can it be, installing phpmyadmin, automatically, without manual intervention.

Of course this turned out to be slightly more difficult than I thought, and as I couldn't find a real good resource on the web I decided to put my findings in a blog. Possibly more as a future reminder for myself :-)

Installing packages such as phpmyadmin on Debian/Ubuntu is usually a breeze, thanks to the apt package manager. And by passing the -y option you can send yes to all questions that may arise in this setup.

Also, Debian (and hence also Ubuntu) has for a long time the DEBIAN_FRONTEND parameter. By setting this to noninteractive, no questions will be asked.

# export DEBIAN_FRONTEND=noninteractive

# apt-get -q -y phpmyadmin

However, what to do with questions that really need some inputs, such as asking for the database password. Welcome to debconf-set-selections.

With debconf-set-selections you basically answer the questions that will be asked in the setup before the actual install.

So by running the following command, the main questions will be answered and then the actual install will proceed without that pesky blue screen:

echo 'phpmyadmin phpmyadmin/dbconfig-install boolean true' | debconf-set-selections
echo 'phpmyadmin phpmyadmin/app-password-confirm password your-app-pwd' | debconf-set-selections
echo 'phpmyadmin phpmyadmin/mysql/admin-pass password your-admin-db-pwd' | debconf-set-selections
echo 'phpmyadmin phpmyadmin/mysql/app-pass password your-app-db-pwd' | debconf-set-selections
echo 'phpmyadmin phpmyadmin/reconfigure-webserver multiselect apache2' | debconf-set-selections

Nice, but where do these variables come from? Welll, actually there are a lot more variables to play with, and if you really want to know run the following command after the install of the package:

debconf-get-selections | grep phpmyadmin

This will return all parameters for that particular package, most of them are self-explanatory.

For completeness, the debconf-utils package is needed, but luckily that installs with only the -y parameter provided.

Wednesday, October 17, 2012

Here are a few things Windows Azure must do to catch up with Amazon Web Services

I have spent many days of my - admittedly not so short anymore - life working with Amazon's cloud and although it is not perfect I definitely like it.

Amazon sort of invented the Infrastructure as a Service cloud and it is a huge success. Apparently a staggering 1% of the internet runs on AWS and one out of three users visit an AWS hosted service on a daily basis. Wow!

One of the reasons of this is success that a few other powerhouses (i.e. Google and Microsoft) decided to bet on a slightly different horse: the Platform as a Service. PaaS is on a higher abstraction level and in theory promises more benefits but the fact that the application must be modified (most of the time) in order to be hosted on a PaaS turned out to be a big threshold. Hence the IaaS proved to be the more popular option so far and the others were looking with envy at the success of (most notably) Amazon.

As could be expected this situation will not last forever and Microsoft with its Windows Azure offering has moved into the IaaS territory as well. Which is good, after all competition is good!

Note: do not consider Amazon's offering as an IaaS offering only, as it has many features nowadays which compete head-to-head with Microsoft PaaS features.

I started working with Azure a few months ago and the first thing that drew my attention is the minimalism approach of Azure in comparison with verbose (and sometimes crowded) screens of Amazon Web Services.

Compare (one step) of a Launch Instance wizard of Amazon:

With the entire Windows Azure dialog:

Quite refreshing, but in fairness it also indicates that Azure is still lacking in features in comparison with Amazon. Do we need all those features? Most definitely not, but sometimes a few them come in very handy.

But in my view the most important thing that Windows Azure should work on is not in the feature area (although still appreciated) but much more in the management sphere.

Below are a few things I really missed when working with Azure.

Detailed Identity & Access Management. With AWS you can create users for both UI and / or API and can provide very detailed permissions for them. If needed this user directory can be synchronised with on-premise directories and this proves to be a very flexible and important directory. Azure then is very limited in this area, and although the have a very promising asset in the form of Windows Azure Active Directory this is is very much in its infancy and lacks integration throughout the various Azure services.
Monitoring & Alerting: Amazon's Cloudwatch can not be considered as a full blown monitoring service but for quite a few services this proves to be more than sufficient. By providing a wide variety of standard metrics and allowing custom metrics it provides a lot of input on which alarms can be defined. In contrast, Azure's proposition is much more limited which becomes problematic in real-life deployments.
Infrastructure as Code: Since IaaS turns what used to be hardware into software, the Infrastructure as Code phenomenon has taken off. Rather than fiddling with physical stuff, you can write code to do that for you. Both AWS as well as Azure provide an API that allow you to do that (in a procedural manner) but AWS provides on top of this a much more powerful service in the form of Cloudformation. Cloudformation allows you to specify and manage your cloud resources in a declarative which provides a lot many benefits.
Managed DNS: When launching new services these must be available by name rather than IP address. Amazon's Route 53 provides a manages and programmable (also through Cloudformation) DNS provides that ensures that you can launch an fully operational, accessible service, without having to go back to you DNS provider of choice to ensure things are modified accordingly.

This list is certainly not exhaustive but I am positive that Microsoft (and othes like Google) can mount a serious challenge to Amazon. However we need more than just features, especially the management area is important when it comes to run and exploit cloud based applications.

Saturday, September 29, 2012

Why I ditched my Apple MacBook and went back to Windows

I am the owner of a heterogeneous IT landscape. Windows, OSX, iOS, Android, Linux and even the good old Symbian are all still in the house and I'm perfectly fine with that.

About three years ago I bought my first MacBook Pro and I was assuming I would be very happy with it. Sure, it would take a bit of time to get used to the specifics of OSX but hey, that would make it just a bit more interesting.

And coming from a corporate laptop running Windows XP, completely bogged down by all the crapware running on it and suffering from the dreaded 'things are getting slower and slooower and slooooower' Windows problem, I was delighted by the speed and stability of OSX.

Rebooting? (Hardly) no need for it. Waking up? Almost instantly. Searching: Spectacularly useful and responsive. Hardware: Rock solid.

And still, after a while I noticed I still didn't feel entirely comfortable with it, which was strange given the wide range of OSes I am usually exposed to. The most used part of the OS is the interface to the file system, and the Finder turned out to be a big disappointment. It works clumsy, and is really no match for the Windows counterpart. Cutting and pasting a file? Forget about it (I know, you can buy an extension, but still). Create a new file when you have navigated to a particular directory? Nope. Minor things, but simply doesn't help in the overall experience.

The other thing I am using a lot (and who isn't?) is Office. I'd really liked to use something like Open Office, but given the fact that 99.99% of the world is using Microsoft Office that's what I settled with, trying to avoid conversion problems. Well, Office on the Mac is really rubbish when you compare it to the Windows counterpart. Obviously this is not Apple's fault, but still: it doesn't help getting comfortable on it.

Maximizing a screen: let OSX decide how big is big enough? Except that it doesn't work properly. (I know, I didn't upgrade from 10.6 and things were supposed to be better after that).

Making a print screen of a Windows: Command+Shift+4, then spacebar, then click a window. Are you kidding me? This is supposed to be simple and intuitive? I can go on, but I think you get the point.

But OSX is great as a development box, as it has SVN, SSH, Apache and all these kind of things out of the box. Yeah, I like the bash shell, and having native access to SSH but this is pretty much compensated by a lack of tools which are available on Windows only, like Tortoise SVN and VMWare Player (much better than VirtualBox). And the fact that if you sometimes need Windows only tools (e.g. SQL Server, Enterprise Architect and so on) which obviously requires a Virtual Machine on OSX, while it wouldn't on a Windows box doesn't help either.

After having worked only on my MacBook for almost three years, I started working on an assignment and got a Windows desktop which was (still) fast, running MS Office and the likes. And after a few weeks, using my laptop and this Windows desktop sort of side by side, I just couldn't do anything than admit: I simply like Windows better. OSX is rock solid but is still not there yet from a user experience perspective. And that's not only me, since I spoke the unspeakable, I ran into quite a few people that had to admit that they were struggling with their shiny MacBooks and didn't like it either.

So that's why I ended up back in the Windows world. I bought myself a new, flashy ultrabook and I am delighted that I'm rid of OSX. I do miss the hardware though, the three year old MacBook pro is still an excellent piece of equipment, outshining this new (Asus) ultrabook in many ways (as a matter of fact I am typing this on my MacBook running Windows 7 with bootcamp).

If a MacBook would ship natively with Windows, that would be my option, but as this never will happen I don't see myself buying a new MacBook anytime soon. Windows it is then, just need to find a way to keep it fast.

Friday, September 21, 2012

Building managed Service Stacks using Cloudformation

Infrastructure as Code

With the advance of cloud computing, we have reached a point where the hard infrastructure really can be treated as software, as code.

Networks, servers, firewalls, DNS registrations and so on does no longer require the physical handling it required not so long ago, it can be managed by running a script. Hardware goes soft!

This is a very interesting development, but brings challenges which will sound very familiar to the software development community. How to keep these software blocks maintainable and re-usable. How to deal with relatively quick changing versions of this infrastructure configuration. How to minimize dependencies? And so on.

For the developers among us, this sounds pretty straightforward but this is not necessarily the case for the guys or galls with no software or scripting background.

Amazon's Cloudformation

One of the areas I ran into quite a few times lately is how to structure these building blocks when using Amazon's (great) Cloudformation service. According to AWS, "AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.".

Using Cloudformation, AWS Resources can be managed by specifying the needed resources (in a declarative way) in a JSON formatted template, and through the management console or command line interface, a so-called Stack can be instantiated by providing this declaration and a set of input parameters. This stack can be updated by providing an updated template or changing the parameters, and Cloudformation takes care of propagating these changes into the actual infrastructure components. Cloudformation even provides tools to configure (bootstrap) the actual server instances, but its main focus is on the infrastructure elements itself.

All very nice and dandy, but the questions I ran into is how to structure these cloudformation stacks. Make one big stack containing everything? Or each individual resource in its own stack? Or something in between.

This sounds remarkably similar to lots of discussion in the software engineering area. Remember Object Orientation? Component Based development? Service oriented architectures?

Well, in fact is bears a lot of resemblance and I truly believe we should embrace such best practices rather than inventing the wheel again.

Unfortunately we don't have the same level of sophistication as the real software development languages and tools provide, but given the options we have we can achieve a reasonable level of isolation and re-usability.

Meet the Managed Service Stacks.

The core of such an infrastructure is typically a server or a set of servers. Such a server implements one or more roles (e.g. web server, app server, database server) and can be used in different setups (e.g. development, test, production).

However, in order to let this server do what it is supposed to do, a lot more resources are needed. A few examples include:

We need a firewall configurations (security groups) to open up only the necessary ports to a particular set of clients.
We want this server to be accessible by name rather than IP, hence the DNS must be configured, including a fixed (elastic) IP address.
Maybe we do not want one server, but a flexible, automatically scalable set of servers. We're in the cloud after all, aren't we. And sure, we need to load balance these servers as well.
We want to monitor these servers for health and availability and want to be informed if things are getting out of hand.

Suppose we have a rather traditional system setup consisting of:

one or more web servers;
one or more app servers;
one database cluster.

I prefer to model the cloudformation stacks along these three different server types, rather than putting these all together in one stack. So each of these server types will have its own stack (the Service Stack) containing all elements needed for these servers to do what is needed.

The service stack for the web server group could look like this:

The stack contains all the resources needed for the web server to provide its core services. All changes (e.g. different scaling policy, additional DNS name) are managed by this service stack.

The overall environment (let's assume production environment) of this straightforward setup consists of a number of Service Stacks, one for each of the server type and possibly an environment stack that contains the setup of the network (VPC, subnets, NAT instance and so on).

Conclusion

Infrastructure as Code brings challenges which are very familiar to the software engineers among us. Following the concepts as pioneered in the good old days of components based development helps in splitting the configuration of Cloudformation stacks into manageable chunks. Unfortunately the tools are not very sophisticated yet, and lack core features such as to maximise code and configuration re-use but they still can be immensely useful in dealing with large scale deployments.

Wednesday, September 12, 2012

Setting up a central log analysis architecture (with syslog and splunk)

Introduction

The larger the systems, the more headache it gives when troubleshooting problems. One of the things that really helps and which is relatively easy to achieve is to make sure that all logs are accessible from a central place and can be analysed and queried in real-time.

Setting this up is easier than you might think and in this post I will talk you through this process using rsyslog (the native syslog daemon on many linux distros these days) and Splunk.

Overview

In the diagram below, I have shown a high level overview of the architecture.

Typically, there will be many (logging) clients in the solutions, examples of these are the Web and Application Servers, Database Servers, Management servers and so on. These clients typically run one or more applications which either can log their messages directly to the (local) syslog daemon and/or write them to one or more log files. The syslog daemon is by default configured to write incoming log messages to a number of local log files, but can easily be configured to submit these messages to a remote syslog server.

This log server takes these inbound messages and stores them in a convenient folder structure. These local log files then can be indexed by the Splunk server, which allows for very powerful analysis of this data through a web interface.

If the application on the clients can not write to the syslog daemon but writes it to local log files instead, the rsyslog daemon can be configured to monitor these log files and submit these as syslog messages to log server. Not all syslog daemons can do this though, and even the rsyslog daemon has limited capabilities in this regard, e.g. the name of the input log file must be static. Another (optional) way of forwarding messages to the Splunk server is by using the Splunk Forwarder. Personally I prefer using syslog though as I feel it is a more lean and proven method and all messages on the log server are handled in the same way, but it is always good to have an alternative, right?

Configuration

Let's start with setting up the central log server. We are assuming an Ubuntu 12.04 instance, which comes with rsyslog by default, but setup on other flavours should be identical or similar.

Accept inbound messages from remote servers

To accept inbound messages from remote servers, ensure that in /etc/rsyslog.conf the following configs are present:

### Load TCP and UDP modules

$ModLoad imtcp

$ModLoad imudp

Rsyslog knows the concepts of templates and rulesets, which allows you to specify how particular messages must be dealt with. In this case we make a distinction between the incoming messages from remote clients, and the local messages by defining a template for both cases.


### Templates

# log every host in its own directory

$template RemoteHost,"/mnt/syslog/hosts/%HOSTNAME%/%$YEAR%/%$MONTH%/%$DAY%/%syslogfacility-text%.log"



### Rulesets

# Local Logging

$RuleSet local
# Follow own preferences here....



# use the local RuleSet as default if not specified otherwise

$DefaultRuleset local 



# Remote Logging

$RuleSet remote

*.* ?RemoteHost

Then bind these rule sets to a particular listener:


### Listeners

# bind ruleset to tcp listener and activate it

$InputTCPServerBindRuleset remote

$InputTCPServerRun 5140

$InputUDPServerBindRuleset remote

$UDPServerRun 514

This is it, all messages coming in on the TCP or UDP listener now will be stored in its own directory structure, conveniently grouped by host and date.

The syslog daemons on the clients in turn need to be configured to send their messages to this syslog server.

In this case we follow the convention by adding configuration snippets in the /etc/rsyslog.d directory rather than modifiying the /etc/rsyslog.conf file. By default, all *.conf files in this directory will be included.

In order to read a number of log files and process them as syslog messages, we add the following config file:


File: /etc/rsyslog.d/51-read-files.conf

#  Read a few files and sent these to central server.

#



# Load module

$ModLoad imfile #needs to be done just once

# Nginx Access log

$InputFileName /var/log/nginx/access.log

$InputFileTag nginx-access:

$InputFileStateFile stat-nginx-access

$InputFileSeverity info

$InputFileFacility local0

$InputRunFileMonitor

# Nginx Error log

$InputFileName /var/log/nginx/error.log

$InputFileTag nginx-error:

$InputFileStateFile stat-nginx-error

$InputFileSeverity error

$InputFileFacility local0

$InputRunFileMonitor

This will pick up the nginx access and error log files and process them as syslog messages.

To forward these messages to the syslog server, we add the following file:


File: /etc/rsyslog.d/99-forward.conf

#  Forward all messages to central syslog server.

#

$ActionQueueType LinkedList    # use asynchronous processing

$ActionQueueFileName srvrfwd   # set file name, also enables disk mode

$ActionResumeRetryCount -1     # infinite retries on insert failure

$ActionQueueSaveOnShutdown on  # save in-memory data if rsyslog shuts down

*.* @@log.mydomain.com:5140 # Do the actual forward using TCP (@@)

This will forward all messages using TCP to the log server. UDP (using one @) or a high reliability protocol (RELP, using :relp:) can be used as well.

After restarting the syslog daemon, this basically takes care of collecting all log files in one central location. This in itself is already very useful.

But having Splunk running on top of this is even more powerful. Splunk can be downloaded for evaluation purposes and can be used as a free version (with a few feature limitations) up to an indexing capacity of 500 MB per day. Note it does not limit the total index size, but only the daily volume.

Once installed, it allows you to log in, change your password and add data to the Splunk indexer. After all, this is what you want to do.

After clicking Add Data, you'll be greeted with the following screen:

In this page you can select 'From files and directories'. This takes you to the Preview data dialogue, which enables you to see a preview of the data before you add it to a Splunk index. Select Skip preview and click Continue.

This takes you to the Home > Add data > Files & directories > Add new view. Select the default source (Continuously index data from a file or directory this Splunk instance can access) and fill in the data to your path.

Normally you would index everything in a particular subdirectory (e.g. /mnt/syslog/hosts/appsrv1/2012/09/12/*) of set van directories (e.g. /mnt/syslog/hosts/.../*). It might be useful to address the individual files one by one in order to define how they are dealt with by Splunk.

Now select More Settings.

This enables you to override Splunk's default settings for Host, Source type, and Index. To automatically determine the host based on the path in which the log file is stored, select 'segment in path' with value 4.

Note this has to match the value as specified in the rsyslog template definition.
RemoteHost,"/mnt/syslog/hosts/%HOSTNAME%/%$YEAR%/%$MONTH%/%$DAY%/%syslogfacility-text%.log"

What about the Source type and Index settings? The source type of an event tells you what kind of data it is, usually based on how it's formatted. Examples of source types are access_combined or cisco_syslog. This classification lets you search for the same type of data across multiple sources and hosts. The index setting tells Splunk where to put the data. By default, it's stored in main, but you might want to consider partitioning your data into different indexes if you have many types.

Click Save and then you are ready to go and open the search app. If you want getting your feet wet with this search app, have a look at this tutorial. Happy splunking!

Wednesday, September 5, 2012

The Economics of the Cloud

Two sides of a story

"We're gonna save lots of money in the cloud!".

Well, there you have it. If you want to save money (and who doesn't) and you have a one or more IT applications that can live in a cloudy world, then this is the way to go, isn't it? After all, if you look at the costs of (for instance Amazon Web Services) you can have a server instance for as little as $0.02 per hour. 2 pennies! Who could ever compete with that?

Of course this sounds very attractive but reality is usually a bit less rosy.

"Wow, this Amazon service is really expensive!"

This is a not unusual reaction you get after running an application for a few months in the cloud and the actual costs are becoming more visible. Running a full-fledged system 24/7 in the cloud is certainly not for free, and the costs associated with it are significant. That might come as a nasty surprise when the first bill comes in.

In my view, both extremes stem from a lack of insight in:

what exactly is needed to support a cloud-based application and/or

a lack of insight in the total cost of ownership for on-premise solutions.

Below, I will provide a few pointers (non-exhaustive) that might be helpful in comparing the costs of both directions.

Context

Before trying to answer these questions, first some context. Cloud computing is a very broad term and is applied to a wide variety of services. To keep things simple we focus on a specific type of cloud services, the type where you can rent computing capacity and storage usually known as Infrastructure as a Service. This is the type of Cloud computing with the lowest abstraction level, meaning you have to manage most of the stack yourself.

The good thing about IaaS though, and that is why this makes it the most popular option on the market today, is that it is very flexible and little migration is needed. Basically, what you run on-premise, you can typically run this in an IaaS cloud as well (at least, from a technical perspective).

So what about this rosy picture?

When moving to the cloud it is easy to get blinded by these stunning advertised costs, for instance $0.02 per hour for a Micro server instance. This is however just part of the story.

Nothing is for free

Taking Amazon as an example, it soon becomes evident that (almost) nothing is for free. You need storage? Pay for it. Use your storage? Here's the bill. Backup? This is what it costs. You want to restore your backup? Pay for it! Monitoring alerts? Well, you get the point.

All these individual cost components are priced very reasonably but still they add up and makes things significantly more expensive then you initially thought when you read about these 2 cents per hour.

So: understanding all cost components are needed for making a valid comparison.

A server instance is not a server

Probably one of the least understood things within AWS is how to compare these server instances to its real-life, physical counterparts. For example, AWS advertises the CPU capacity using the ECU, which is a fictive unit that is comparable with a real CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. Note, a 2007 unit, so hardly state of the art. IO capacity of the instance is even more obscure as this is vaguely indicated using as 'Moderate' or 'High' IO performance, without mentioning the actual bandwidth that comes with it.

In addition, you need to keep in mind that there the fact that a cloud platform is inherently multi-tenant might have some negative impact on the performance of your own server instances. This can be countered by allocating large chunks of capacity (rather than multiple smaller ones) or in case of AWS even request dedicated hardware but obviously this will have a cost impact.

The bottom line is that when you launch a server instance, you might not get the capacity you were anticipating, requiring you to upgrade or launch additional instances.

So: make sure you understand the actual capacity provided in comparison with physical servers.

Elasticity might be difficult to achieve

The single most cited benefit of cloud computing is the ability to size the capacity of your systems (regardless whether this is automatic or not) according to the demand. In contrast, traditional systems are often sized for the maximum peak in anticipated demand, resulting in dramatically under-utilized systems. Server virtualisation already started addressing this, and cloud computing is the supposed to be top-of-class in this regard. With dramatic cost savings as a result. Right?

Well, yes, it could be. Automatic scaling is a very useful feature but it might be difficult to fully exploit it. It comes with its own set of challenges, such as:

Is your application capable of dynamically spreading the load over multiple servers. For a straightforward web server this is typically not too much of an issue but more complex applications or servers holding state (e.g. databases) things usually get more complicated.
How do you provision these spontaneously launching server instances?
How much time is need to spin up a new server instance and configure it so that it can actually take part in processing the workload. Does this time needed to scale up match the actual peaks in the demand?
How do you keep a grip on the instances that are actually running? An error in the provisioning might result in server instances being launched while they never become active part of the system.
Auto-scaling is best served by relatively small server instances, however as discussed before these smaller instances come with their own drawbacks as well. A trade-off is needed.

So: make sure that you assess the applicability of auto scaling before reserving the cost savings.

Allocation vs. Usage

Cloud computing is typically associated with the Pay as you Go paradigm. You pay for something when you need and use it.

Not so fast. This applies to quite a few things but unfortunately not in all cases. For example online (block) storage is typically allocated beforehand and you really pay for what you allocate, and not for what you actually use. Another example is the use of reserved instances, which allows you to buy reserved and discounted capacity for a one or three year period. The more you pay now, the greater discount you get. However for the Heavy Uitlisation reserved instances you are charged for every hour of the month, regardless of the state of the server instance.

So: it is really necessary to understand the extent in which the Pay as you Go paradigm applies to the different cost components.

So that settles it, the cloud is way too expensive

Hold on, this is not the message I'm trying to convey. I am actually a strong believer in cloud computing and I sincerely believe this is a massive paradigm shift we are witnessing. Does this means it is applicable to all use cases? Of course not. And does it also mean that computing costs are all of a sudden a fraction of what they used to be? No way.

However, when this unexpectedly high bill is coming in at the end of the month, it is easy forget about the service that has been delivered and difficult to compare apples and pears.

What is the total cost of ownership of the alternative then?

As mentioned before, almost everything comes at a cost at Amazon, which might be considered as a blessing in disguise. In the end, Amazon is providing the service to make some money and being one of the largest IT infrastructure exploiters in the world you can expect them to have a good understanding of the cost components of such an infrastructure.

By charging you these individual line items, it does provide you with an insight about the total cost of ownership (TCO) which you might not have had before. And as long as it is not clear what the TCO of the alternatives are, you cannot state that the one of the these is too expensive.

In my view, there are plenty of use cases where an on-premise or co-located solution is more economical but the differences won't be spectacular and you surely have to have a very well-organised IT organisation to achieve these benefits.

So: ensure you understand the alternative's TCO when comparing it with a cloud based solution.

Apples and Apples?

It is an, among IT pros, popular pastime to compare the costs of an off-the-shelf server with what Amazon is charging you for the equivalent of that. And boy, what does Amazon suffer then.

Except that this is not a valid comparison. To start with it usually doesn't end with this server alone. You need additional infrastructure such as storage, backup equipment and so on. The server must be housed, cooled, mounted in a rack, power must be supplied, physical installation is needed.

And what happens when this server dies (which will happen)? Well, at best a spare server is available or a sufficient service contact is in place, but even in these cases more often than not it takes a significant amount of time before the replacement is ready to go. How much is it worth then that in a cloud computing use case that replacement can be fired up (even fully automatically if needed) within minutes? And that the solution can be migrated to a disaster recovery sites without huge upfront costs and without intervention of the cloud provider?

Basically you are comparing a service with a piece of equipment, which is at best only a part of the solution. Apples and pears.

So: calculate the cost of the on-premise IT service rather than a piece of equipment when comparing costs.

Benefits of capacity on demand.

As discussed before, automatic scaling of the solution's capacity to meet the demand might be more complicated than it sounds. That's very true, but it still doesn't mean that it has no value. It certainly does, big time! A system designed for elasticity used for fluctuating load profiles is able to save significant costs, no doubt about it. In short: the more spiky the demand, the better it suits in a cloud use case.

But there are many more use cases where the capacity of demand proves to be a real winner. What about setting up a temporary test system? Running a load test on a representative system setup? Testing a disaster recovery procedure? Scaling up capacity as there is a large data migration going on which normally would last for days or even weeks? And very often for very little costs as this typically runs for hours or weeks rather than months or years.

It is evident that this flexibility is extremely useful and more often than not quite a few of these things simply wouldn't be possible with on-premise systems.

So: try to calculate the value of this flexibility into account before making up your mind.

OPEX vs. CAPEX

While talking about costs it is tempting to easily compare the sum of the monthly costs over a three year period with a scenario having a large initial investment in combination with smaller monthly costs. Except that this is not the same. Every financial specialist (which I am not) will be able to explain the benefits of Operational Expenses (OPEX) vs. Capital Expenses. It is very attractive to spread the payment of a large sum of money over three years rather than paying the majority of that money upfront, a principle firmly exploited by credit card firms.

So: ensure to take capital costs into account.

The bottom line

On purpose I tried to focus on the economical aspects of cloud computing vs. more traditional alternatives, while assuming a bit of a Devil's advocate role. But economy is just one part of the equation. Rather than managing your own data centres with dedicated equipment, cloud computing allows for more focus on your core business. From that perspective, whether or not utilising cloud computing is much more a strategic choice rather than something just based on numbers.

Sometimes it works, sometimes it certainly does not but in all cases a thorough deep understanding of both sides of the story is needed to make a qualified decision.

A few references

Googling for economic benefits of the clouds will result in a huge number of hits, but a few of them I found very interesting. One article that raised quite a stir was the AWS vs. Self-Hosted article, including the response from Amazon's Jeff Bar. Another interesting, although less quantified is the article that comes with the intriguing name Is cloud computing really cheaper? Finally, a nice interactive spreadsheet that aims at comparing Cloud vs. Colo cost is worth to have a look at.

Tuesday, July 31, 2012

How to allow AWS EC2 servers in an VPC public subnet accessing external resources without having an Elastic IP Address assigned to it

Amazon's EC2 Virtual Private Cloud feature is very useful, it allows for much more control of the network architecture and provides additional features such as the possibility to add more IP addresses per instance.

Typically, a VPC consists of at least a public subnet, hosting publicly accessible servers and one or more private subnets hosting more sensitive servers such as a database server. The servers in the private subnet are obviously not accessible from the outside, but they still can communicate with external resources through a NAT instance.

Servers in the public subnet are only accessible when they are assigned an Elastic IP Address or through a load balancer (e.g. AWS ELB service).

So far so good. Something less obvious is that if a server in the public subnet does NOT have an elastic IP address assgned (e.g. it is connected to a load balancer) it also loses the capability of accessing external resources. This would be a show stopper, as for instance it is much more difficult to allow these servers to auto-configure themselves upon launch.

Well, wait, we have this NAT service, haven't we? Yes, we have but by default it is not possible for servers in the public subnet to use this NAT instance. Changing the VPC route tables is not going to work, as it will break the connectivity of the EIP-attached servers in the public subnet.

In order to still allow EIP-less public servers to use the NAT instance, the network configuration of these instances must be modified to point the default gateway to this NAT instance. This requires a static IP configuration of course, and to avoid running into all kind op IP conflicts it is possible to let AWS hand out the IP address using DHCP and then convert upon launch time (e.g. using a cloud-init script) to a static configuration with the desired default gateway.