Queuing

The queuing system is the highest level feature of our cluster plans and the level at which users will interact with the cluster. Condor is a very mature queuing system with great flexibility, http://www.cs.wisc.edu/condor/ has more detail than you're likely to want to know about it.

The queuing system is, by design, abstracted from the underlying hardware, so it is possible to add the queuing feature to a preexisting cluster regardless of the underlying architecture.

Condor provides robust per machine user priorities and resource specification which will allow us to provide three distinct levels of availability:

  • General, in which everyone has equal priority
  • Priority, in which everyone has access but priority users will preempt non-priority users
  • Private, in which only specified users have access to the resource

The current plan is to support "General" access on systems purchased and run by TIG, "Priority" on systems purchased by research groups but maintained by TIG and "Private" on systems which are neither own nor operated by TIG. Groups of course may opt to have more open access if they feel particularly generous.

Users' jobs will be matched to the most specific available resource, so if you have private and priority systems you will be matched in the following order:

  1. your private resources
  2. your priority resources
  3. general resources
  4. resources on which others have priority

You'll note that when running as a non-priority user on someone else's resource, or if you've accumulated a lot of hours on the cluster your job may be preempted for a higher priority user. Condor has methods for preempting and restarting jobs and, under certain conditions, automated check pointing so jobs that are preempted pickup where they left off.

More details on the specifics of using the queuing system will be available in the CondorIntro page

One of the central features of Condor is it's ability to do cycle harvesting on idle workstations, suspending or migrating jobs when the workstations become active. The Infolab group is currently doing preliminary testing of this configuration.

-- JonProulx - 14 Feb 2007
Topic revision: 16 Jul 2007, JonProulx
 

MIT Computer Science and Artificial Intelligence Laboratory

 

  • About CSAIL
  • Research
  • News + Events
  • Resources
  • People

This site is powered by Foswiki MIT: Massachusetts Institute of Technology