Syntax Highlighter

Wednesday, July 28, 2010

Howto: Build a ten node memcached cluster in five minutes

Introduction

Memcached is used to cache everything from dynamically generated web pages to database query results. Multiple memcached instances are used in situations where a single memcached server does not have enough CPU or RAM to fulfill application requirements.

In a distributed setup, memcached servers run independently of each other, and have no knowledge of other servers. Memcached's design pushes the logic of dealing with multiple servers to the client, which makes things very simple on the server end.
GridCentric Copper is a bare-metal-up virtualization and management stack built to manage "virtual clusters". Copper's fast virtual machine cloning (which works similarly to fork() in UNIX, except it creates clones of entire running virtual machines in just a few seconds) can be used to quickly expand and contract a collection of memcached servers. We're going to use this to make a single memcached virtual machine, then grow that machine to a pool of 10 virtual machines in a couple seconds.

Requirements
  1. GridCentric Copper (free fully featured, 60-day trial available here, software downloadable here).
  2. The GridCentric DNS mapper (part of the Copper distribution).
  3. Enough free hardware resources to host 10 virtual machines.
Process

Create a New Virtual Cluster

First, create a new virtual cluster using either the included "cluster-in-a-box" script or via the Admin Web Console. In this walkthrough we will name our virtual cluster "memcached". Make sure to give the virtual cluster a managed public network interface, and make sure the "master only" checkbox is unchecked.

Let's boot the memcached virtual cluster.
gc-vc boot memcached
This will bring up a single virtual machine - a cluster of size 1. This virtual machine will be the basis of our memcached cluster, so we need to set it up. We get a console on the new master with:
gc-vm console memcached-0
This will give us a login prompt on the master.

Install memcached

We install memcached according to the README and configure it to listen on all addresses (edit the /etc/memcached.conf file and change '-l 127.0.0.1' to '-l 0.0.0.0'), and then start it up.

We now have a Copper virtual cluster, which for now contains a single virtual machine running memcached.

Create some clone virtual machines

To start up more memcached instances, we invoke the gc command line tool from within the virtual machine to create some clones of itself:
[ubuntu]$ gc clone 9
After a few seconds, the clone command returns, we have now have 10 virtual machines running memcached.

Note: we just scaled 1 memcached instance to 10 with zero extra configuration, in just a few seconds!

Configure the client address pool
Now we have to let the memcached clients know about the servers so that the clients can add the servers to their pool.

One easy method is just to use the GridCentric DNS service. This service runs on any computer with access to the Copper head node, and does dynamic mapping of DNS lookups to virtual clusters within Copper.

One of its particularly nice features is that it maps the name of a running virtual cluster to the list of all public managed IPs on that virtual cluster.

On some machine which has access to the GridCentric DNS service (let's say it's running on host gridcentric-headnode), we execute the following:
$ dig +short memcached @gridcentric-headnode
192.168.1.144
192.168.1.142   
192.168.1.143   
192.168.1.140   
192.168.1.146   
192.168.1.141   
192.168.1.145   
192.168.1.149   
192.168.1.147   
192.168.1.148

This returns a list of public IPs for all 10 of the memcached servers running within the virtual cluster. You could even design your web application to automatically query this DNS server and periodically update its view of the set of servers running.

That's it! You're now ready to start using memcached on your client nodes.

Homework:
Try to accomplish the above on a traditional cluster setup ;)

1 comments:

  1. This is not a good way to create a memcached cluster, since the data in memory for them would be (or will be) different, and via php you will never know where you'll land, and thus you'll end up consulting your database more often than you expect.

    You should do a distributed array, and not a robin-round "cluster"
    ReplyDelete