|
Home
> Research
Support Group > WestGrid
> Using PBS
University
of Alberta
WestGrid
Site-Specific Information
Portable
Batch System (PBS)
|
Introduction:
This document describes how to run jobs on the WestGrid complex of SGI
Origin computers using PBS, the Portable Batch System. Please read the information
for new users before
proceeding. You should also become acquainted with the Usage
Policies and Priority
Calculation documents.
Also, again, please feel free to contact research.support@ualberta.ca
if you have any questions or comments.
Job
Scripts:
To prepare a job for submission, first create a shell script that can
be used to run your program. Include all necessary environment variable
settings.
For example, to run a program that has been parallelized with OpenMP
directives, your script might look like this:
#! /bin/sh
export
OMP_NUM_THREADS=64
cd /scratch/esumbar
./a.out |
Next, introduce PBS directives to specify your job's resource needs and
describe its attributes. PBS directives are shell-script comments that
have the form "#PBS flag value". Be sure to group all of the
PBS directives together at the beginning of the script and do not
intermingle executable statements.
For example:
#! /bin/sh
#PBS -S /bin/sh
#PBS -q arcturus
#PBS -l
host=arcturus
#PBS -l ncpus=64
#PBS -l
walltime=12:00:00
#PBS -m bea
#PBS -M
esumbar@ualberta.ca
#PBS -N myjob
export
OMP_NUM_THREADS=64
cd /scratch/esumbar
./a.out >output
2>&1 |
If you were to submit the above script for execution, PBS would
interpreted the directives as follows:
LINE 2:
-S /bin/sh
-
-
This is the shell that PBS will use to execute your script file. If
omitted, your login shell on the execution host is used. This has an
impact on which set of startup files are processed, and consequently,
which set of environment variables your script inherits. This shell and
the one specified on the #! line don't have to be the same. The only
rule is that the syntax of the script must be consistent with the shell
specified on the #! line.
LINE 3:
-q arcturus
-
-
The requested queue; and for this example we have chosen
"arcturus". Each queue has resource limits which reflect the
capabilities of the machine that hosts the queue or which are derived
from administrative policies. There are currently seven supported
queues, and they happen to have the same name as their respective host.
They all have a 24-hour duration.
queue
name |
duration
(hours) |
min
ncpus |
max
available
ncpus |
max
available
memory
(GBytes)
|
| nexus |
24 |
1 |
6 |
8 |
| arcturus |
24 |
64 |
256 |
256 |
borealis |
24 |
8 |
64 |
16 |
| australis |
24 |
16 |
64 |
32 |
| corona |
n/a |
n/a |
n/a |
n/a |
| helios |
24 |
1 |
32 |
16 |
LINE 5:
-l ncpus=64
-
-
Number of cpus required by your job. In this example, 64 processors of
arcturus, are been requested. Just remember to request one cpu for each
parallel thread of execution.
LINE 6:
-l walltime=12:00:00
-
-
This specifies the minimum amount of elapsed time required by your job.
Typically, this is set to the time needed to reach the first checkpoint.
LINE 7:
-m bea
-
-
Instruct PBS to email you when your job begins and ends, or is aborted.
The actual email address is specified on the next line ; i.e., in the
"M" directive.
LINE 8:
-M esumbar@ualberta.ca
-
-
Specifies the email address that will receive PBS notifications. In
this example PBS would try to send mail to one of our analysts, please
be sure to substitute your own email address.
LINE 9:
-N myjob
-
-
In this example we have chosen an unimaginative name of
"myjob". Chose a short name (less than 16 characters, no
spaces) that you can use to identify your job. If omitted, the name of
the script (truncated to the first 15 characters) is used.
-
For the sake of simplicity, we recommend that you save the script in
the same directory as your program files, preferably in a personal
directory under /scratch. Give the script an appropriate name, in our
example we called it "myjob.sh" .
-
When the job runs, the script is executed on your behalf using the
specified shell on the specified host. At this point, it's just an
ordinary shell script and all comments, including PBS directives, are
ignored. The program (a.out in the example) runs as a child process of
the script. When the program terminates, execution returns to the
script, which itself terminates, finally terminating the job.
Submitting
and Monitoring Jobs:
To submit a job, simply "cd" to the job directory and submit
the script to PBS using the "qsub" command. You can
subsequently monitor the status of the job with the "qstat"
command.
For example:
First here is a very simple script, (you can see the entire script from
the "more" command), that submits a job to the aurora queue,
to use 4 processors, with a wall time of 12 hours, and it is to send an
email when it is done (please substitute your own email address):
Next, here is screen shoot of actually submitting the above script to
PBS using the "qsub" command, and using the "qstat"
command to monitor the status of the job:
(lines in output edited to save space).
A "Q" in the S (status) column means that the job is waiting
in the specified queue, while an "R" means that the job is
currently running. The TSK column displays either the requested number
of cpus or the memory-equivalent number of cpus, whichever is greater.
The job that was submitted in this example was given the identification
number 87642. It shows up waiting in the aurora queue as expected. See
the man pages regarding qsub and qstat for further details.
Normally, while a job is executing, PBS redirects standard output and
standard error output from the script to two private files. When the
job ends, this output is returned to you as, in this example,
mytestjob.o87642 (standard output) and mytestjob.e87642 (standard
error). The standard output from your program inherits this redirection
through the process hierarchy. Consequently, if your program calls
printf() (C) or executes the print* or write(*,fmt) statements
(Fortran) as a way of monitoring execution progress, you will not be
able to see this output until the termination of the job. To overcome
this problem, you should explicitly redirect standard output from your
program to your own file(s) as illustrated in the example script. Be
sure to follow each relevant output statement in the code with a call
to the fflush() function (C) or the flush subroutine (SGI Fortran) to
force an immediate update of the file(s).
Stopping
Jobs:
The job owner can remove a waiting job from the queue or terminate a
running job using the qdel command and supplying the PBS job
identification number. When used to terminate a running job, the qdel
command delivers a TERM signal to the process group. To send a KILL
signal, or any other supported signal, to a running job, use the qsig
command. Signalling a waiting job has no effect. See the man pages for
additional documentation.
Checkpointing
and Resubmitting Jobs:
Checkpointing is the process of saving the value of essential data into
one or more files so that the program can be restarted. When a new job
is submitted and the program starts running again, the saved data is
read in from the checkpoint file(s) and execution resumes at the point
where the program was terminated.
For example, here is a basic checkpointing algorithm implemented in
Fortran for a typical simulation program:
program
demo
implicit
none
integer
NX, NY, NZ, MAXSTEP, CPT
parameter
(NX=128, NY=128, NZ=128)
parameter
(MAXSTEP=500000, CPT=1000)
real*8
data(NX,NY,NZ)
integer
step, laststep
integer
get_laststep
laststep
= get_laststep()
if
(laststep .eq. 0) then
call
initialize_data(data,NX,NY,NZ)
else
call
read_checkpoint(laststep,CPT,data,NX,NY,NZ)
end
if
do
step = laststep + 1, MAXSTEP
call
do_work(data,NX,NY,NZ)
if
(mod(step,CPT) .eq. 0) then
call
write_checkpoint(step,CPT,data,NX,NY,NZ)
end
if
end
do
call
final_output(data,NX,NY,NZ)
call
clean_up()
stop
end |
In this program, a checkpoint is performed whenever step is evenly
divisible by CPT. The implementation of the checkpoint-related
subroutines get_laststep, read_checkpoint, and write_checkpoint
reference alternating checkpoint files as a measure of fault tolerance.
The full text of the demonstration program is in demo.f . An
equivalent C version is in demo.c .
Contacting
Us:
If you'd like more information about using the machines, production
schedule, machine availability, and/or help with porting your code,
please contact us research.support@ualberta.ca
Revised: August 02, 2006.
|
|