Workstations in the Condor Cluster

Overview

Workstations are a greatly under utilized computing resource. Though powered on 24/7 they are rarely used more than 1/3 of that time by their owners, and some "public" systems around the lab are used much less than that. Condor, the queuing system used by the lab's ClusterComputing, was designed to opportunistically use these idle cycles. Over it's development Condor came to be a very full featured queuing system that goes far beyond just harnessing workstations, but making use of these idle cycles without impacting use by the system's owner has remained a key feature area.

Current State

BUG FIXED: as of Feb 2009 workstation installs are now doing tighter enforcement of memory constraints. Joining your workstaion to condor is now a recommended practice

Since May 2007, workstations and server in the Infolab Group have been joined to the CSAIL Condor system. Over time a few other systems in TIG were joined as well. The trials have been entirely successful to this point. Interactive use of the systems has been completely unaffected, and they have contributed significant CPU hours to cluster jobs.

At this point we are considering making cluster membership the default for all CSAIL/Debian workstations (the current keytab requirement, see below, unfortunately requires manual intervention at this point). There will be an opt-out mechanism, but we're fairly confident that virtually no one would need to use that as condor jobs should make little to know impact on interactive system use.

How to Join

We encourage people to join the pool so we can get an even larger test pool before committing everyone to it.

Here's what you need to do:

        sudo install -o root -g root -m 600 /afs/csail/group/tig/keytabs/$USER/$HOSTNAME.keytab \
        /etc/krb5.keytab
  • Run "sudo /usr/local/csail/sbin/join_condor"

Your workstation will then be part of the cluster. You will probably also want to look at our CondorIntro to get up to speed on using the cluster, if you haven't already.

If you are a Group Manager and your group has special needs we can work with you to create a custom condor class for your group systems. For example if you have a special software environment and want jobs you submit to only run on your group's machines, or if you want to define user priorities for access to the systems (ie members of the group get priority over other lab members). As usual send mail to help@csail.mit.edu if you'd like to make special arrangements.

Why to Join

  • You're not using those cycles anyway
  • You'll have access to submit jobs from your workstation (particularly useful for DistributedComputingWithMATLAB)

How to Leave

  • Are you sure you want to deprive your labmates of your workstation's precious cycles?
  • If so, run "sudo /usr/local/csail/sbin/leave_condor"

-- JonProulx - 11 Feb 2009
Topic revision: 02 Jun 2009, EricSchwartz
 

MIT Computer Science and Artificial Intelligence Laboratory

 

  • About CSAIL
  • Research
  • News + Events
  • Resources
  • People

This site is powered by Foswiki MIT: Massachusetts Institute of Technology