Copper represents a new paradigm for grid computing.
Why? Copper flips some of the standard practices in grid computing on their head. I'll illustrate this by starting with a typical cluster environment.

(click for larger version)
In this environment, users log in to a cluster headnode (or login node) and submit jobs (as scripts) to a scheduler that copies those scripts to several pre-configured slave nodes (a.k.a. compute nodes).
The software environment that runs on the headnode must be carefully crafted by an administrator to meet all of the users' needs and accommodate each application that they may want to run. Note that users do not have any administrative control over the cluster in this context, so they may not freely install new software. This leads to users building software in their personal home directories, using large, statically-linked binaries and often banging their head against the wall.
Additionally, administrators must ensure that the slave nodes are kept in sync with the headnode. If a new software package is installed on the headnode, it must be installed on each of the slaves. If a configuration or library changes, then it must be changed on each of the slaves.
A Copper computing environment is fundamentally different.
(click for larger version)
In this case, users may each be given a virtual headnode on which they have complete administrative control. This means that they can freely change the configuration and install new software. They are also free to use the distribution and tools that they are familiar with (and possibly have on their workstations).
When jobs are submitted to a queue in a Copper cluster, slave nodes are created on the fly, which are exact clones of the headnode at the time. This means that the user does not need to worry about updating software on any slaves nodes, as everything will be exactly as the same as it was on their virtual headnode. If their software ran on the headnode, then it is guaranteed to run on their slave nodes. When the jobs complete, the slave nodes are automatically destroyed and their associated resources are freed.
How is this possible? Our technology allows virtual machines to clone nearly instantaneously and with almost zero overhead. Using this cloning for grid computing means that you don't have to worry about keeping environments in sync. Ever.
In fact, the above diagram is not really accurate. Scripts are not copied to the slave VMs, because they are already there. Because the slaves are guaranteed to up-to-date from the point when you actually submitted the job, they can just run a command that you specify instead of having to copy a script. That's fundamentally cool; it's like we copy the whole headnode whenever you run a job.
As an example, this will run flawless on Copper, but not on traditional clusters:
apt-get install python # Install python.
vim myprogram.py # Edit my python program.
gcqsub python myprogram.py # Submit my python program to the queue.
In the above, I install python, edit a program and run it through the grid queue in steps that are just are natural as using my workstation.

0 comments:
Post a Comment