The Portable Batch System, PBS, is a workload management system for Linux clusters. It supplies command to submit, monitor, and delete jobs. It has the following components.
| Option | Description |
| #PBS -N myJob | Assigns a job name. The default is the name of PBS job script. |
| #PBS -l nodes=4:ppn=2 | The number of nodes and processors per node. |
| #PBS -q queuename | Assigns the queue your job will use. |
| #PBS -l walltime=01:00:00 | The maximum wall-clock time during which this job can run. |
| #PBS -o mypath/my.out | The path and file name for standard output. |
| #PBS -e mypath/my.err | The path and file name for standard error. |
| #PBS -j oe | Join option that merges the standard error stream with the standard output stream of the job. |
| #PBS -W stagein=file_list | Copies the file onto the execution host before the job starts. (*) |
| #PBS -W stageout=file_list | Copies the file from the execution host after the job completes. (*) |
| #PBS -m b | Sends mail to the user when the job begins. |
| #PBS -m e | Sends mail to the user when the job ends. |
| #PBS -m a | Sends mail to the user when job aborts (with an error). |
| #PBS -m ba | Allows a user to have more than one command with the same flag by grouping the messages together on one line, else only the last command gets executed. |
| #PBS -r n | Indicates that a job should not rerun if it fails. |
| #PBS -V | Exports all environment variables to the job. |
(*) File staging can specify which files should be copied onto the execution host before the job starts and which files should be copied off the execution host when it completes. The file_list regardless of the direction of copy, is of the following form, where the name local_file is the name of the file on the system where the job executes, and the remote_file is the destination name on the host specified by hostname: local_file@hostname:remote_file.
stagein=my.input@frontend-0:/home/login_name/my.input
stageout=my.output@frontend-0:/home/login_name/my.output
Command Description showq Show a detailed list of submitted jobs showbf Show the free resources (time and processors available) at the moment checkjob job.ID show a detailed description of the job job.ID showstart job.ID gives an estimate of the expected started time of the job job.ID
There are a number of predefined environment variables. These include the following:
The following environment variables relate to the submission machine:
Option Description PBS_O_HOST The host machine on which the qsub command was run. PBS_O_LOGNAME The login name on the machine on which the qsub was run. PBS_O_HOME The home directory from which the qsub was run. PBS_O_WORKDIR The working directory from which the qsub was run.
The following variables relate to the environment where the job is executing:
Option Description PBS_ENVIRONMENT This is set to PBS_BATCH for batch jobs and to PBS_INTERACTIVE for interactive jobs. PBS_O_QUEUE The original queue to which the job was submitted. PBS_JOBID The identifier that PBS assigns to the job. PBS_JOBNAME The name of the job. PBS_NODEFILE The file containing the list of nodes assigned to a parallel job.
The following job script template should be modified for the need of the job.
A job script may consist of PBS directives, comments and executable statements.
A PBS directive provides a way of specifying job attributes in addition to the
command line options. For example:
#or, for opteron
#PBS -N Job_name
#PBS -l walltime=10:30,mem=320kb
#PBS -m be
#
step1 arg1 arg2
step2 arg3 arg4
#!/bin/sh -f
#PBS -N Kick_some_ass
#PBS -l nodes=2:ppn=2
#PBS -l walltime=24:00:00
#
LAMSTART="lamboot $PBS_NODEFILE"
LAMSTOP="lamhalt $PBS_NODEFILE "
HOME="/home/rroussea"
LAUNCH="mpirun -np 4 cpmd.x"
WORKDIR=${HOME}/cp_test
export PP_LIBRARY_PATH=${WORKDIR}
cd ${WORKDIR}
${LAMSTART}
${LAUNCH} au_surf_job1.in > au_surf_job1.out
${LAMSTOP}
#
exit
Use the qsub command to submit the job.
qsub jobA
PBS assigns a job a unique job identifier once it is submitted (e.g. 123.opteron). After a job has been queued, it is selected for execution based on the time it has been in the queue, wall-clock time limit, and number of processors.
Below are commands for monitoring a job:
Command Function qstat -a check status of jobs, queues, and the PBS server qstat -f get all the information about a job, i.e. resources requested, resource limits, owner, source, destination, queue, etc. qdel job.ID delete a job from the queue qhold job.ID hold a job if it is in the queue qrls job.ID release a job from hold
At present the batch queues are defined as follows:
| QUEUE | N. CPU MAX | Time Limit per CPU (dd+hh:mm) |
Total Time Limit (dd+hh:mm) |
| dque | - | 0+>10:00 | - |
| QUEUE | Time Limit per CPU (hh:mm:ss) |
Total Time Limit (hh:mm:ss) |
| dque | - | 10:00:00 |
| egrid | 48:00:00 | 72:00:00 |
| gridats | 48:00:00 | 72:00:00 |
| stormdev | 48:00:00 | 72:00:00 |
| QUEUE | N. CPU MAX | Time Limit per CPU (dd+hh:mm) |
Total Time Limit (dd+hh:mm) |
| dque | (depending on groups) | 0+>96:00 | - |
| shorttest | (depending on groups) | 0+>00:15 | - |
| QUEUE | Max running jobs | Walltime max (dd+hh:mm) |
Walltime default (dd+hh:mm) |
Memory limit |
| q32m | 6 | 4+00:00 | 0+12:00 | none |
| q32s_short | 4 | 0+12:00 | 0+01:00 | none |
| q32s_long | 4 | 4+00:00 | 0+12:00 | none |
| q32x | 2 | 0+12:00 | 0+01:00 | < 512Mb |
| q64s | 20 | 4+00:00 | 0+12:00 | none |
| q64x | 2 | 0+12:00 | 0+01:00 | < 512Mb |
m - for Maritan's users
s - for Sorella's's users
x - for other users or
extra jobs for m and s
qsub -l node=1:ppn=2 (for two cpus)You are warmly invited to check if SMP is convenient !
Simple batch script for opteron
LAMSTART="lamboot $PBS_NODEFILE"Note: These commands are NONOPTIONAL at the moment for MPI jobs. They start/stop the mpi environment.
LAMSTOP="lamhalt $PBS_NODEFILE "
Suppose you want tu run the program hello.x on 16 processor for 1 hour, then if you want to specify the requests on the qsub command line you should write the jobscript file as follows:
#!/bin/shBaCiuco and Briareo
cd workdir
mpiexec -n 16 hello.xopteron
mpirun -np 16 hello.x
and you can submit it to the queuin system with the command:
qsub -l nodes=8:ppn=2,walltime=1:00:00 jobscript
If you prefer to include the requests in the jobscript, then the jobscript should be:
#!/bin/sh
BaCiuco and Briareo
#PBS -l nodes=8:ppn=2,walltime=1:00:00
cd workdir
mpiexec -n 16 hello.x
opteron
mpirun -n 16 hello.x
and you can submit it to the queuin system with the command:
qsub jobscript
test.xinteractively on four processors then you could use the following sequence of commands:
qsub -l nodes=2:ppn=2,walltime=0:30:00 -Iat this point (if there are free resources) you will enter in the batch interactive session, and you could run your test with: BaCiuco and Briareo
mpiexec -n 4 -no-shmem test.xopteron
mpiexec -np 4
lamboot -v $PBS_NODEFILE
cd testdir
mpirun -n 4 -no-shmem test.xExample of an interactive execution:
mpirun -np 4
qsub -l nodes=2:ppn=2,walltime=0:30:00 -IBaCiuco and Briareo
cd testdir
mpiexec -n 4 test.xopteron
mpirun -np 4 test.x

qsub -l walltime=12:00:00,nodes=2:ppn=2:myriand the scheduler will avoid non-myrinet nodes. If you don't need myrinet (single- or dual-cpu jobs) please do not append :myri, so you will leave more myrinet nodes free and also if some node has the myrinet card out of order, you can use the node for serial and 2 cpus jobs