Syntax Highlighter

Friday, May 27, 2011

Creating VMs with JavaScript and Node

Seriously, JavaScript?

Node is a framework built around the speedy V8 JavaScript interpreter. It enables you to use JavaScript to create back-end services for your web application (or any other application). I/O is based around the same event-driven model that is familiar to front-end JavaScript programmers. Putting aside any claims about the inherent performance benefits of that model, it's powerful to use the same language in all parts of a web application and I have no doubt that Node will grow to stand next to the big players (Rails, Django, Struts) in the world of web frameworks.

Copper is focused on enabling dynamic and flexible services on the back-end. We empower programmers and administrators with a programmatic model for scaling applications, embedded within the control flow of the applications. Given that Node is an upcoming web framework, a few weekends ago I embarked on the fun exercise of creating some GridCentric V8 bindings.

When I co-founded a systems company a couple years ago I would never have guessed that I'd write a line of JavaScript, but technology brings us to unexpected places.

This post has two simple goals.

  1. To enable GridCentric API calls in Node applications.
  2. To use those bindings in a nifty demo, creating VMs on-demand to scale an application.

For a nifty demo, I've chosen to take the standard Node chat demo and make it scale automatically as users join chat rooms, using only about a hundred lines of JavaScript.

Building the extension

To skip the details of building the extension, click here to jump straight to the demo.

Starting points for V8

If you want to write a Node extension, without a doubt the best place to start is the useful blog post from cloudkick. Figuring out where to go next once you've exhausted their example however, is a bit tricky. I found that the V8 embedder's guide was mostly useless, but maybe you'll have a different experience. I think the best way to learn more is by looking at the Node source and the source for other native extensions (a relatively complete list of extensions can be found here).

Synchronous bindings

Let's start with the easy stuff.

Simple, synchronous function bindings for Node are quite straight-forward. Many of our API functions can be considered non-blocking (reading a value from the kernel, equivalent to a system call) so they are quite simple to wrap. For example, the C function below will return the current vmid.

#include <gridcentric/gc-guest.h>

int func() {
  return gc_vmid();
}
In the JavaScript world, we really want to this look like:
var gridcentric = require("gridcentric");
var vmid = gridcentric.vmid();
To get those semantics, we can take the C example above and wrap it in some V8 voodoo.
#include <node/v8.h>
#include <node/node.h>

#include <gridcentric/gc-guest.h>

using namespace node;
using namespace v8;

static Handle<Value> VmId(const Arguments& args)
{
    HandleScope scope;
    Local<Integer> result = Integer::New(gc_vmid());
    return scope.Close(result);
}
The V8 function is complete.

Before we have a usable extension however, we must bind this function appropriately within the native module. When Node loads a native extension, it executes the init symbol, passing in a variable representing the scope created for the module. Using a FunctionTemplate wrapper, we define an init† symbol that does the appropriate binding within the module.

extern "C" {
  void init (Handle<Object> target)
  {
    Local<FunctionTemplate> vmid = FunctionTemplate::New(VmId);
    target->Set(String::NewSymbol("vmid"), vmid->GetFunction());
  }
}
The init function must be wrapped in an extern "C" declaration to prevent g++ from name mangling.

Almost done -- we only need the build script. To build the module with our vmid function, we first create a wscript file (assuming that our source file is src/gridcentric.cc) used by the Node build tool node-waf.

def set_options(opt):
  opt.tool_options("compiler_cxx")

def configure(conf):
  conf.check_tool("compiler_cxx")
  conf.check_tool("node_addon")

def build(bld):
  obj = bld.new_task_gen("cxx", "shlib", "node_addon")
  obj.cxxflags = ["-Wall"]
  obj.ldflags = ["-lgridcentric"]
  obj.target = "gridcentric"
  obj.source = "src/gridcentric.cc"
Finally, we run node-waf configure && node-waf build to build our extension.

We now have a basic gridcentric module, and the following code works.

var gridcentric = require("./build/default/gridcentric");
var vmid = gridcentric.vmid();
console.log("My vmid is " + vmid);

Now we can move on to adding some meat to the module.

Understanding non-blocking operations

Node event-driven semantics require that any function doing significant work (i.e. I/O) be structured using callbacks. Much of the heavy-lifting of a binding is caused by the need to restructure calls to your library using the callback mechanism provided by Node.

This can be a bit of a pain, but fortunately many of the functions in our guest bindings were well-suited to the asynchronous callback style required by Node (and I think that it's generally not too difficult to find a nice mapping). For example, the request ticket operation may take a few hundred milliseconds to make the round-trip to the scheduler, allocate the requested resources and return the result. Similarly, the clone operation may take seconds, but in a complex control flow you'll likely need to be doing other things during that time.

With our C bindings, the request ticket function call looks like:

#include<gridcentric/gc-guest.h>
...
gc_uuid_t ticket;
if( gc_request_ticket(1, 1, 1, 1000, &ticket) < 0 ) {
   perror("Couldn't request ticket");
}
If we were to translate directly into JavaScript using a synchronous style this might look like:
var gridcentric = require('gridcentric');
...
ticket = gridcentric.request_ticket(1, 1, 1, 1000);
if( ticket ) {
  console.log("Successfully allocated ticket " + ticket + ".");
} else {
  console.log("Unable to allocate ticket in 1000 milliseconds.");
}
But because the request_ticket operation will block up to 1000 milliseconds in this case, this function doesn't conform to the non-blocking semantics required by Node.

Instead, we must structure this function to use an asynchronous callback when the ticket request is completed, as follows:

var gridcentric = require('gridcentric');
...
gridcentric.request_ticket(1, 1, 1, 1000, function(ticket) {
  if( ticket ) {
    console.log("Successfully allocated ticket " + ticket + ".");
  } else {
    console.log("Unable to allocate ticket in 1000 milliseconds.");
  }
});
Notice that we pass in a function as the last parameter. This function will be called asynchronously with the return value of the request_ticket function after it has completed. We don't have any guarantees about when that function will be executed.

Once this style is adopted for all functions, we can easily see how to chain operations using closures. For example, in order to fork() the VM we can extend the above:

var gridcentric = require('gridcentric');
...
gridcentric.request_ticket(1, 1, 1, 1000, function(ticket) {
  if( ticket ) {
    console.log("Successfully allocated ticket " + ticket + ".");
    gridcentric.clone(ticket, function(vmid) {
       if( vmid > 0 ) {
           console.log("On a clone VM.");
       } else if( vmid == 0 ) {
           console.log("Still on the master VM.");
       } else {
           console.log("Error during clone operation.");
       }
    });
  } else {
    console.log("Unable to allocate ticket in 1000 milliseconds.");
  }
});

Implementing callbacks

Before implementing these functions, you'll notice above that my simple example did not require any arguments. The first thing that I will do is define a number of processor macros to sanity check passed in arguments.

#define REQUIRE(I, ISTYPE, CASTTYPE, NAME)                     \
  if( args.Length() <= (I) || !args[I]->Is##ISTYPE() )         \
    return ThrowException(Exception::TypeError(                \
      String::New("Argument " #I " must be a " #ISTYPE "."))); \
  Local<CASTTYPE> NAME = Local<CASTTYPE>::Cast(args[I]);

#define REQUIRE_STRING(I, NAME) \
        REQUIRE(I, String, String, NAME)
#define REQUIRE_INTEGER(I, NAME) \
        REQUIRE(I, Number, Integer, NAME)
#define REQUIRE_FUNCTION(I, NAME) \
        REQUIRE(I, Function, Function, NAME)

We could now implement a no-op request ticket function that takes the appropriate arguments.

static Handle<Value> RequestTicket(const Arguments& args)
{
    REQUIRE_INTEGER(0, maxcpus);
    REQUIRE_INTEGER(1, minvms);
    REQUIRE_INTEGER(2, mincpuspervm);
    REQUIRE_INTEGER(3, timeout);
    REQUIRE_FUNCTION(4, cb);
    return Undefined();
}
It sanity-checks it's input. Now it needs to do something.

Node uses libeio as the basis for its thread pool (which, assuming you are not working with raw file descriptors and sockets, you will likely be using). To use libeio, you schedule two functions for future execution: one that does the work and one which will be called when the work is completed. The function called when the work is completed will be executed in the main thread, so it needs to be quick. You are also permitted to pass an opaque pointer, which will be (indirectly) passed to each of the two functions.

For our example below, we will first define a new class that we can use as an opaque pointer. This class will hold all data related to the ticket request, the callback function passed in, and the return value to be given. Since we've going to have three functions involved: the one called by V8, the one scheduled by libeio and the one executed after the work is complete, this class will be used to pass around shared information to each of them.

We will also declare two functions ahead of time that will use for our libeio work, EIO_RequestTicket and EIO_Post.

class CallbackData
{
public:
    Handle<Value> This;      // The this scope we were called in.
    Persistent<Function> cb; // The callback function passed.
    Handle<Value> rval;      // The return value to be given.

    // The parameters required for request_ticket.
    int maxcpus;
    int minvms;
    int mincpuspervm;
    int timeout;
}

static int EIO_RequestTicket(eio_req* req);
static int EIO_Post(eio_req *req);
We use the This variable to track the scope, cb to record the callback the user passes in and rval to store the return value once the work is done. The rest of the parameters are required for the actual ticket request.

Given these declarations, the actual RequestTicket function is straight-forward.

static Handle<Value> RequestTicket(const Arguments& args)
{
    REQUIRE_INTEGER(0, maxcpus);
    REQUIRE_INTEGER(1, minvms);
    REQUIRE_INTEGER(2, mincpuspervm);
    REQUIRE_INTEGER(3, timeout);
    REQUIRE_FUNCTION(4, cb);

    // Create the opaque pointer.
    CallbackData *data = new CallbackData();

    // Set the scope variable (in case its needed).
    data->This = args.This();

    // Set the parameters associated with the ticket request.
    data->maxcpus = maxcpus->Value();
    data->minvms = minvms->Value();
    data->mincpuspervm = mincpuspervm->Value();
    data->timeout = timeout->Value();

    // Save the passed callback.
    data->cb = Persistent<Function>::New(cb);

    // Schedule the EIO functions to be run.
    eio_custom(EIO_RequestTicket, EIO_PRI_DEFAULT, EIO_Post, data);
    ev_ref(EV_DEFAULT_UC);

    return Undefined();
}
As required, it doesn't do any real work. It allocates the opaque data pointer (the CallbackData class we defined), schedules the work in the thread pool (lines 15 and 16), and returns Undefined() immediately.

All that remains is for us to actually implement the missing functions.

The first EIO_RequestTicket does the work required (called gc_request_ticket) and sets the return value (rval) in the opaque pointer.

static int EIO_RequestTicket(eio_req* req)
{
    CallbackData *data = static_cast<CallbackData*>(req->data);
    gc_uuid_t uuid;

    if( gc_request_ticket(
            data->maxcpus, data->minvms, data->mincpuspervm,
            data->timeout, &uuid) < 0 ) {
        // Set the result to undefined.
        data->rval = Undefined();
    } else {
        // Save the resulting ticket as a string.
        data->rval = String::New(uuid.value);
    }

    return 0;
}
The second function, takes the given opaque pointer, creates a V8 array using the return value and calls the callback function that was passed in as a argument. This will also not block.
static int EIO_Post(eio_req *req)
{
    CallbackData *data = static_cast<CallbackData*>(req->data);
    ev_unref(EV_DEFAULT_UC);
    Local<Value> argv[1] = { *(data->rval) };
    TryCatch try_catch;
    data->cb->Call(Context::GetCurrent()->Global(), 1, argv);
    if (try_catch.HasCaught()) {
        FatalException(try_catch);
    }
    data->cb.Dispose();
    delete data;
    return 0;
}
That's it! All that's left to do is to bind the RequestTicket function appropriately within the extension (see vmid example above), then our asynchronous request ticket function will be working like a charm.

Wrapping objects

Some of the GridCentric API functions return more complex structures. Although I would recommend mapping values to V8 primitives wherever possible, the need may arise to return more complex JavaScript objects.

After negative experiences with wrapped objects in V8, I think that unless you require complex interactions with the JavaScript world -- you can return complex objects as simple JavaScript Objects (i.e., no prototype). Below is my example for creating a TicketInfo object.

#define SET_VALUE(VAR, NAME, TYPE, VAL) \
    VAR->Set(String::New(NAME), TYPE::New(VAL))
#define SET_INTEGER(VAR, NAME, VAL) \
        SET_VALUE(VAR, NAME, Integer, VAL)
#define SET_STRING(VAR, NAME, VAL) \
        SET_VALUE(VAR, NAME, String, VAL)

class TicketInfo {
public:
    static Handle<Object> Create(gc_ticket_info_t info)
    {
        HandleScope scope;
        Local<Object> obj = Object::New();
        SET_STRING(obj, "id", info.id.value);
        SET_STRING(obj, "status",
           gc_ticket_status_string(info.status));
        SET_INTEGER(obj, "cpus", info.cpus);
        SET_INTEGER(obj, "vms", info.vms);
        SET_INTEGER(obj, "mincpuspervm", info.mincpuspervm);
        return scope.Close(obj);
    }
};

Pre-processor tricks and gotchas

If you look at the source for my extension on bitbucket, you'll see that I didn't explicitly define separate classes and functions for each of the callbacks. Due to the repetitive nature of the wrapping, I wrapped most of the callback code into hacky pre-processor macros.

I also encountered one annoying gotcha while building the Node extension. During the clone operation, libgridcentric executes the scripts at /etc/gridcentric/pre-clone and /etc/gridcentric/post-clone. This execution is simple. Here's some pseudo-C.

pid_t child = fork();
if( !child ) {
  exec(script);
} else {
  int rc = waitpid(child,...);
}
When executed from within Node, the waitpid fails with return value -1 and causes the clone operation to be aborted if there is an /etc/gridcentric/pre-clone script. Why? Ostensibly, the waitpid fails because a different part of Node gobbles up all child processes and their associated return values. Presumably this is prevent Zombie processes, but it's not a great solution. The workaround for the gridcentric extension is to remove these scripts, but then you lose this functionality.

The application

Enabling the GridCentric API in an application running on our platform allows it to dynamically scale horizontally by requesting resources and cloning itself, much in the same way fork() works in UNIX. The cloning operation is handled transparently from under the VM in seconds, with state magically propagated. With a Node application, our stack will look something like this.

With the bindings I've just built, this operation in JavaScript looks like this.

var gridcentric = require("gridcentric");
gridcentric.request_ticket(1, 1, 1, 1000, function(ticket) {
  if( ticket ) {
    gridcentric.clone(ticket, function(vmid) {
      if( vmid < 0 ) {
        console.log("There was an error.");
      } else if( vmid == 0 ) {
        console.log("I'm on the original VM.");
      } else {
        console.log("I'm on a clone with id " + vmid + ".");
      }
    });
  } else {
    console.log("Unable to allocate resources.");
  }
});

Service structure

More logic is required to scale a service than simply cloning a VM. To scale any service horizontally, you'll need to implement same kind of proxy or load-balancing mechanism.

Using the completed bindings, I created an autoscale.js module which turns the master VM into a proxy (based on this) and routes requests to clones which are created automatically. In other words, it turns a regular Node application into a auto-scaling service.

In this case, we create a new VM for every two active users we have and don't synchronize state across different VMs (think of it as a Node chat roulette -- only with cloning VMs).

More specifically, the autoscale.js implements the following simple algorithm.

  • Every second, we fetch the list of clone domains and store their IPs in a global array.
  • This information is used by the proxy to route incoming connections.
  • If there is less than one clone for every two active connections, we create the appropriate number of clones.
  • Obviously, this is kind of a bold (ridiculous) metric for measuring load and scaling the system.
  • When a new connection arrives, it is mapped to the latest clone VM.
  • We could use a number of more reasonable strategies here, such as round robin, least-loaded, random. The last clone heuristic is actually quite silly, but allows for a deterministic demo.

Integration

To leverage this service, I modified the chat demo to use the auto-scaling module. This required adding the following lines at the bottom of the server.js file:

setTimeout(function() {
  as = require("./autoscale");
  as.setup(80, PORT);
}, 3000);
I add the 3 second delay so that the service can get started before the first clone operation.

The following video is a quick demo of the service. I log on to the auto-scaling chat service with five different users. Because the service has been configured to create new VMs for every two users, the five users in the demo are routed to three different VMs that are created on-demand, in the span of seconds. Enable annotations for notes during the video.

Caveats

Much like Node chat, the demo described here is not intended to be a serious service. Were you to add auto-scaling to a real application, you'd definitely need to do a better job of tracking active hosts, synchronizing necessary state across slaves and handling errors in general.

It's worth pointing out however, that this is miles easier with the gridcentric extension than in the case where you have to provision new VMs from scratch. Provisioning from scratch, you'll likely need to involve lots of languages and tools (init scripts, chef or puppet, configuration files, proxies, synchronization servers) before you even touch the application. The semantics of clone() give the programmer a very powerful primitive on top of which they can build reliable distributed services. Plus, it's pretty awesome.

Do-it-yourself

If you have a Copper installation, feel free to install the bindings from NPM and try them out for yourself. There are likely a few bugs, but I'd love to hear feedback or complaints. You will need to have the gc-guest-base package installed in the VM to provide libgridcentric, then the module can installed simply:

$ npm install gridcentric
gridcentric@0.0.1 ./node_modules/gridcentric

If you want to dig more into the bindings (or steal macros -- please go ahead), the full source is available here. This source also includes the simple autoscale.js module used above.

Enjoy!