|
Preliminary Comments
SGE can be difficult to install and configure, but if you login to grid.math.duke.edu,
you should automatically have everything you need preconfigured and ready to run.
The following is an excerpt from Jeffrey B. Layton's page at http://docs.warewulf-cluster.org/contrib/sge.html with minor cosmetic and local changes
Using SGE
Now that we have SGE installed and configured, let's test it. We're going
to try a very simple example that just runs the date command
on a single node. Create a job script called sge-date.job containing
the following :
#!/bin/bash
#$ -cwd
/bin/date
|
Lines beginning with #$ are special comments that are passed to SGE as options. Lets
look at the what this scripts will do :
- #!/bin/bash : The first line tells SGE to run the job script
using the bash shell.
- #$ -cwd : The second line is a special command to tell SGE to
put the results in the directory where you submitted it.
- /bin/date : The third line is the actual command to be run.
In this case, it's the date command.
Now let's run this command by submitting the job script to SGE.
grid{sge}2: qsub sge-date.job
your job 2 ("sge-date.job") has been submitted
|
Notice that you can (and should) run this job script as a user. After you
submit the job, SGE will assign a Job ID. This Job ID is unique.
In this case, the Job ID is 2. Then when the job is done, SGE will create
two files in the directory where you submitted the job. For this jobs the
first one, sge-date.job.e2, contains any error messages
from SGE and/or the job. This is where you look for problems if you job
fails. The second file, sge-date.job.o2 contains the output
from the job (things written to stdout).
The qsub command allows you to submit a job to SGE. There
are several options you can use when submitting a job. Look at the man
pages for qsub to find out these options.
SGE tools
There are various commands for submitting you jobs to SGE, tracking the
status of your jobs, and for manipulating your jobs. Let's briefly go
over these commands.
qstat
This command allows you to get the status of SGE and the jobs that are
running or are waiting to be run (queued). Let's explore what
qstat can do for us. Create a simple job script, called
sleeper.sh, that does nothing but sleep for 60 seconds and
then stops.
#!/bin/bash
#$ -cwd
sleep 60
|
While this job does nothing it allows us to see the output from qstat
for several jobs. So, let's submit 6 copies of this job in quick
succession and run qstat. Here's the output from my
cluster.
grid{sge}4: qsub sleeper.sh
...
grid{sge}9: qsub sleeper.sh
grid{sge}10: qstat
job-ID prior name user state submit/start at queue master ja-task-ID
---------------------------------------------------------------------------------------------
5 0 sleeper.sh laytonj r 02/18/2004 18:55:11 admin1.q MASTER
4 0 sleeper.sh laytonj r 02/18/2004 18:55:11 admin2.q MASTER
6 0 sleeper.sh laytonj r 02/18/2004 18:55:11 admin3.q MASTER
7 0 sleeper.sh laytonj r 02/18/2004 18:55:11 admin4.q MASTER
8 0 sleeper.sh laytonj qw 02/18/2004 18:55:05
9 0 sleeper.sh laytonj qw 02/18/2004 18:55:17
|
Because each node has it's own queue, qstat can be a bit
confusing to understand for parallel jobs. In the above example, there
are four jobs running (numbers 4 through 7). Jobs 8 and 9 are waiting
to run as designated by the qw (queue waiting) state.
You can also get more information than is given above, by using the
-f option with qstat. Here's the same
output above but with qstat -f.
$ qstat -f
queuename qtype used/tot. load_avg arch states
----------------------------------------------------------------------------
admin1.q BIP 1/1 0.00 glinux
11 0 sleeper.sh laytonj r 02/18/2004 18:57:41 MASTER
----------------------------------------------------------------------------
admin2.q BIP 1/1 0.00 glinux
10 0 sleeper.sh laytonj r 02/18/2004 18:57:41 MASTER
----------------------------------------------------------------------------
admin3.q BIP 1/1 0.00 glinux
12 0 sleeper.sh laytonj r 02/18/2004 18:57:41 MASTER
----------------------------------------------------------------------------
admin4.q BIP 1/1 0.01 glinux
13 0 sleeper.sh laytonj r 02/18/2004 18:57:41 MASTER
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
14 0 sleeper.sh laytonj qw 02/18/2004 18:57:28
15 0 sleeper.sh laytonj qw 02/18/2004 18:57:28
|
The output is fairly straight forward, but it will take a little bit of
time to get used to it so that it becomes second nature to interpreting it.
|
qdel
This command allows you to delete a job from SGE. For example, if
you have submitted a job that you didn't want to run or one that you
want to stop while it's running. You find the Job ID from the
qstat command. Then you type,
where JOB_ID is the Job ID for that particular job.
|
qmon
SGE includes a nice GUI interface called, qmon. It can be
used to administer SGE (by the SGE administrator) and by users to submit
jobs to SGE. Trying the command qmon and then explore what
it can do (try the various buttons).
|
qhost
The qhost command allows you to get a status of the
nodes that are being used by SGE. Here is an example for my cluster.
$ qhost
HOSTNAME ARCH NPROC LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
admin1 glinux 1 0.00 495.5M 6.2M 515.8M 0.0
admin2 glinux 1 0.01 242.3M 5.9M 515.8M 0.0
admin3 glinux 1 0.00 495.5M 6.3M 515.8M 0.0
admin4 glinux 1 0.01 242.3M 5.4M 517.7M 0.0
|
Notice that the load on the nodes is zero (nothing is running). The output
lists the number of CPUs per node (NPROC), the total memory available on
the node (MEMTOT), the memory in use (MEMUSE), the swap space available
(SWAPTO), and the used swap space (SWAPUS).
|
Sample Job Scripts
I've seen some very complicated job scripts (mostly for PBS). They
can become overly complicated very quickly and will be difficult to
understand and edit. I'm going to give you some simple scripts that
you can use for your SGE jobs. I didn't write these, but I trust the
people who did (I've tested them and they work just fine on my cluster).
I'm going to present a simple script for serial jobs. That is, jobs that
only run on a single node. Then I'll present a sample script for
MPI jobs for MPICH, and then a sample script for LAM-MPI jobs. Finally,
I'll present a sample script for PVM jobs.
Serial Jobs
#!/bin/bash
#
# Set the name of the job.
#$ -N sge-date-run
#
# Make sure that the .e and .o file arrive in the
#working directory
#$ -cwd
#Merge the standard out and standard error to one file
#$ -j y
#
# My code is re-runnable
#$ -r y
#
# The max walltime for this job is 31 minutes
#$ -l h_rt=00:31:00
(Program command here)
|
Recall that the #$ symbol combination is used in the script to indicate an SGE
option.
|
MPI: MPICH
#!/bin/sh
#
# EXAMPLE MPICH SCRIPT FOR SGE
# To use, change "MPICH_JOB", "NUMBER_OF_CPUS"
# and "MPICH_PROGRAM_NAME" to real values.
#
# Your job name
#$ -N MPICH_JOB
#
# Use current working directory
#$ -cwd
#
# Join stdout and stderr
#$ -j y
#
# pe request for MPICH. Set your number of processors here.
#$ -pe mpich NUMBER_OF_CPUS
#
# Run job through bash shell
#$ -S /bin/bash
#
# The following is for reporting only. It is not really needed
# to run the job. It will show up in your output file.
#
echo "Got $NSLOTS processors."
echo "Machines:"
cat $TMPDIR/machines
#
# Use full pathname to make sure we are using the right mpirun
#
/usr/bin/mpirun -np $NSLOTS \
-machinefile $TMPDIR/machines MPICH_PROGRAM_NAME
#
# Commands to do something with the data after the
# program has finished.
#
|
You will have to change the mpirun path to correspond to where you have
instaled it on your system.
|
MPI: LAM-MPI
#!/bin/sh
#
# EXAMPLE LAM SCRIPT FOR SGE
# To use, change "LAM_JOB", "NUMBER_OF_CPUS"
# and "LAM_PROGRAM_NAME" to real values.
#
# Your job name
#$ -N LAM_JOB
#
# Use current working directory
#$ -cwd
#
# Join stdout and stderr
#$ -j y
#
# pe request for LAM. Set your number of processors here.
#$ -pe lam NUMBER_OF_CPUS
#
# Run job through bash shell
#$ -S /bin/bash
#
# The following is for reporting only. It is not really needed
# to run the job. It will show up in your output file.
echo "Got $NSLOTS processors."
echo "Machines:"
cat $TMPDIR/hostfile
#
# This MUST be in your LAM run script, otherwise
# multiple LAM jobs will NOT RUN
export LAM_MPI_SOCKET_SUFFIX=$JOB_ID.$JOB_NAME
#
# Use full pathname to make sure we are using the right mpirun
/usr/mpi/lam/bin/mpirun -np $NSLOTS LAM_PROGRAM_NAME
#
# Commands to do something with the data after the
# program has finished.
#
|
You will have to change the mpirun path to correspond to where you have
instaled it on your system.
|
PVM
#!/bin/sh
#
# EXAMPLE PVM SCRIPT FOR SGE
# To use, change "PVM_JOB", "NUMBER_OF_CPUS"
# and "PVM_PROGRAM_NAME" to real values.
#
# Your job name
#$ -N PVM_JOB
#
# Use current working directory
#$ -cwd
#
# Join stdout and stderr
#$ -j y
#
# pe request for PVM. Set your number of processors here.
#$ -pe pvm NUMBER_OF_CPUS
#
# Run job through bash shell
#$ -S /bin/bash
#
# The following is for reporting only. It is not really needed
# to run the job. It will show up in your output file.
echo "Got $NSLOTS processors."
echo "Machines:"
cat $TMPDIR/hostfile
#
# This MUST be in your PVM run script, otherwise
# PVM jobs will NOT RUN
export PVM_VMID=$JOB_ID.$JOB_NAME
#
# Run the PVM program:
PVM_PROGRAM_NAME
#
# Commands to do something with the data after the
# program has finished.
#
|
|
Parting Comments
SGE is a very powerful scheduling/queuing system. It has many options
to help effectively use all of your resources (your nodes). Take a little
bit of time to look at the man pages. The SGE mailing list is also
very friendly. Don't hesitate to post there and ask questions. Then when
you become an expert you can help others.
Acknowledgements
I want to thank Greg Kurtzer for his hard work with Warewulf and for
packaging SGE so effectively. I also want to thank him for his answers
to my silly questions about SGE as I started to learn it. Finally, I
want to thank Doug Eadline for his help with SGE and some of the
sample scripts.
Copyright, Jeffrey B. Layton, 2004.
|