Distributed Computing with MATLAB
Mathworks introduced a new feature as of Release 14, service pack 3, to the MATLAB product line: the Distributed Computing Toolbox and Distributed Computing Engine. Together, these products allow users to run MATLAB code across multiple parallel hosts.
Matlab DCT and the CSAIL Cluster Quickstart
I really should be more verbose later, but something is better than nothing right?
First you will need to be on a cluster system as described in
CondorIntro, the Distributed Computing Toolkit (DCT) handles the job submissions, so you don't need to worry about those details. You do need to have your working directory set to an NFS path for example /data/scratch/$USERNAME
For Condor submission we user the "generic" scheduler, so here's a quick and dirty M file that calls system('hostname') on a number of systems:
jm = findResource('scheduler','configuration','generic');
set(jm,'configuration','generic');
job = createJob(jm);
for i=1:5
createTask(job, @system, 2, {'hostname'}); end; submit(job);
waitForState(job); results = getAllOutputArguments(job); results{:}
My Matlab-Fu is weak, hopefully you can deduce how to wrap more interesting calculations in there. Or even better someone doing real work with it might give a better overview...
--
JonProulx - 05 Nov 2007
Overview of the Distributed Computing environment
There are three main components of the Distributed Computing environment:
- Clients
- Worker Nodes
- Job Manager
Client machines are simply workstations of any platform that can run MATLAB, which invoke the Distributed Computing Toolbox (DCT) functions. More on this later. Clients that run the DCT have to check out one DCT license seat from the
FlexLM? server. Regardless of how many Worker Nodes a given job might run on, the job only obtains one seat for each toolbox the job requires. This is in contrast to home-brewed parallel jobs that chew up one MATLAB seat and one toolbox seat for each node of the cluster; this sort of use has been causing CSAIL to run out of licenses recently.
Worker Nodes are generally high-power computers, again of any platform that can run MATLAB, that additionally are running the MATLAB Distributed Computing Engine (MDCE). This process can be invoked by hand (by running `$MATLAB/toolbox/distcomp/bin/mdce start`) or automated via an init script. Once MDCE is started, a shell script that is then run instructs the local MDCE process to listen for incoming job requests from the Job Manager that is specified as an argument to the shell script.
Job Managers can be lower-powered machines than the Worker Nodes, since their primary job is to take incoming requests from Clients and parcel out the job to the available Worker Nodes.
DCT/DCE Flowchart
Notes:
Compute server nodes can be just any old computer that can run MATLAB. Nodes in a cluster can be of mixed platforms.
A single physical Job Manager can host multiple instances of the job-manager process. Each instance can be associated with
any arbitrary collection of Worker Nodes. For example, the host memex.csail.mit.edu might run a Job Manager that has three instances:
The instance "CSAIL" might be associated with a general-purpose compute cluster, whereas the other two might be associated with research group specific computers.
- User runs DCT-aware code; specifies a job manager host and job manager instance
- Client MATLAB process checks out a DCT seat from license server
- Job Manager receives job; parcels out job to all nodes registered with this job instance
- Job Manager checks out requisite licenses for this job (1 seat per toolbox)
- Compute cluster processes job; returns value(s) to client
- User rocks out
|

|
--
MarkPearrow - 30 Jan 2006
Getting Started with the dfeval command
First make sure that matlab and condor are included in your home directory's .software file (on two separate lines).
The MATLAB command
dfeval is used to send jobs to the worker node(s). Below are some quick tips on using it:
- Here is a test command you can use, once you've installed the necessary DCT components for MATLAB:
results = dfeval(@sum,{[1 1] [2 2] [3 3]},'jobmanager','csail','LookupURL','magick.csail.mit.edu')
- The example above uses the built-in MATLAB function
sum(). For any user-written m-file functions to work, you will need to include a list of 'FileDependencies'. Be sure to include both the "main" function you are running, as well as any sub-fuctions, mat-files, or other data files that must be sent from your local, client machine to the worker node(s). For instance, if my function "foobar.m" makes calls to the m-file "subfun.m", I'll need to list both:
results = dfeval(@foobar,{arg1job1,arg1job2},{arg2job1,arg2job2},'jobmanager','csail',...
'LookupURL','magick.csail.mit.edu','FileDependencies',{'foobar.m','subfun.m'});
- Here are some commands to poke around and see what jobs have run and how long they have taken:
jm = findResource('jobmanager','Name','csail','LookupURL','magick.csail.mit.edu')
k = get(jm.Jobs)
k(39)
--
katiebyl@mit.edu - 02 Feb 2006