Compute Clustering
Current System Status
http://condor-view.csail.mit.edu/ has graphs of current and historic system utilization.
Topics
Cluster Overview
If your group is planning on purchasing dedicated compute resources
please contact
help@csail.mit.edu to see if your needs would be better
served by buying into our cluster system.
TIG has built out a flexible, reconfigurable
compute clustering environment which is able to serve both as a
lab wide batch queued cluster based on
Condor[1] and also host private or
priority access nodes for groups who purchase dedicated hardware. The short
initial proposal(PDF format) was made in August 2006 after a period of testing various cluster options.
The design, illustrated above, allows the integration or separation of three main components: the queuing systems, the virtualization system, and existing physical servers. The intersections show the areas discussed in this document. The non-intersecting areas suggest the possibilities outside the scope of this document, most notably the existing private group compute resources in their current stand-alone configuration, the possibility of virtualizing infrastructural services, and the possibility of incorporationg workstation idle cycle harvesting into the queing system.
--
JonProulx - 08 Aug 2007
[1]
http://www.cs.wisc.edu/condor/
Queuing
The queuing system is the highest level feature of our cluster plans
and the level at which users will interact with the cluster.
Condor is a very mature queuing
system with great flexibility,
http://www.cs.wisc.edu/condor/ has more
detail than you're likely to want to know about it.
The queuing system is, by design, abstracted from the underlying
hardware, so it is possible to add the queuing feature to
a preexisting cluster regardless of the underlying architecture.
Condor provides robust per machine
user priorities and resource specification which will allow us to
provide three distinct levels of availability:
- General, in which everyone has equal priority
- Priority, in which everyone has access but priority users will preempt non-priority users
- Private, in which only specified users have access to the resource
The current plan is to support "General" access on systems purchased
and run by TIG, "Priority" on systems purchased by research groups but
maintained by TIG and "Private" on systems which are neither own nor
operated by TIG. Groups of course may opt to have more open access if
they feel particularly generous.
Users' jobs will be matched to the most specific available resource, so
if you have private and priority systems you will be matched in the
following order:
- your private resources
- your priority resources
- general resources
- resources on which others have priority
You'll note that when running as a non-priority user on someone else's
resource, or if you've accumulated a lot of hours on the cluster your job may be
preempted for a higher priority user.
Condor has methods for
preempting and restarting jobs and, under certain conditions, automated
check pointing so jobs that are preempted pickup where they left off.
More details on the specifics of using the queuing system will be
available in the
CondorIntro page
One of the central features of
Condor is it's ability to do cycle
harvesting on idle workstations, suspending or migrating jobs when the
workstations become active. The
Infolab group is currently doing preliminary testing of this configuration.
--
JonProulx - 14 Feb 2007
Virtualization
currently in production, Jan '07
The TIG maintained cluster resources will be hosted as
Xen virtual hosts. Virtualizing
compute resources is a somewhat unusual move, but CSAIL has some
unique features that make it desirable to have this level of
flexibility.
Certain groups (you know who you are) have need of custom-configured
sets of systems for relatively brief periods. Clearly needing a burst
of compute power for 4-6 weeks out of the year doesn't justify buying
a private cluster. Using
Xen the
node state can be saved and rebooted and the processor and memory
resources dynamically allocated and deallocated.
The ability to migrate running systems between physical pieces of
hardware and to reboot systems that were running on failed
hardware immediately on other hardware are added bonuses for load
balancing and availability and I'm sure someone will come up with a
novel use ephemeral systems.
The basic node will consist of 1 virtual CPU and 2-4G of RAM though
multi-way nodes can be created for special purposes up to the number of
CPU cores in the physical host system (could emulate more but why?).
--
JonProulx - 14 Feb 2007
--
JonProulx - 30 Nov 2006