Submitting jobs
Submitting jobs
Contents |
PBS
The resource manager used is Torque (PBS) and the scheduler Maui.
Your environment is automatically set up to give access to PBS/Maui.
Here are the most common commands needed to get started.
- qsub - Submit a job.
- qstat - List the running/queued jobs.
- qdel - Remove a job
A job is a script that sets up necessary parameters and then executes programs on the node(s) allocated.
There are four types of jobs that requires different arguments to qsub.
- #Single CPU jobs
- #Multi CPU jobs with shared memory
- #Multi CPU jobs using MPI
- #Commercial parallel applications
NOTE! The default walltime is set to 10min (-l walltime=0:10:00), so if you need longer runtimes, you need to specify it. (Specifying an as correct as possible walltime will help the scheduler make a better work.)
Single CPU jobs
Submit with qsub. Here is an example
qsub -l walltime=24:00:00 testscript
- -cwd will put all output in the current working directory. This is not necessary but data will be put in the root of your home directory otherwise. Stdout = <jobname>.o<jobnum> and stderr = <jobname>.e<jobnum>
- -l walltime=24:00:00 tells the scheduler that you want the resource walltime be 24 hours. See Resources below for more information.
- testscript is the name of the file containing the job script.
A simple script follows:
# Arguments to qsub can be submitted via the script as well by starting # the line with #PBS # # Set your mail address #PBS -M me@my.domain # # Mail on abort #PBS -m a # # Specify time for job (here 1h 10 min 5 sec) #PBS -l walltime=1:10:05 # # Request 1 processor (node) #PBS -l nodes=1:ppn=1 # # Set the name of the job #PBS -N Test_Job # # End of arguments to qsub # Go to work submission directory (Don't remove this line) cd $PBS_O_WORKDIR # Do some preparing work ./prepare_data # Time the program time ./my_big_calc #End of script (make sure line before this gets run)
Multi CPU jobs with shared memory needs to run on a single node and currently that means they cannot use more than 4 or 8 CPUs on the current machines. They can be any kind of job utilizing more than one CPU, including but not limited to threaded applications and MPI programs.
Specify the number of CPUs you want like this:
qsub -l nodes=1:ppn=N
where N=number of CPUs. (N=1 is the default, i.e. it need not be specified for single CPU jobs.)
An example:
qsub -l nodes=1:ppn=3 -l walltime=12:00:00 threaded_jobscript
- -l nodes=1:ppn=3 request 3 CPU's on a single machine
- -l walltime=12:00:00 tells the scheduler that you want the resource walltime to be 12 hours. See Resources below for more information.
- threaded_jobscript is the name of the file containing the job.
Remember these options can be specified on the commandline or in the job script.
NOTE! On Kal/Ada all machines have 4 cores (2 cores per CPU), which from a user point of view means 4 CPUs. On Svea and Beda each node have 8 cores (4 cores per physical CPU), i.e. 8 'CPUs' from a users point of view.
Multi CPU jobs using MPI
Multi CPU jobs using MPI can be run on several machines in parallel but need to support the MPI environment.
Specify the number of CPUs you want like this:
qsub -l nodes=M:ppn=N
where M=the number of nodes and N=number of CPUs. I.e. you want M*N CPUs. For more examples, see Resources below.
NOTE! If using more than one node, you need to sepcify all cores, e.g. on Ada/Kal you need to use -l nodes=M:ppn=4 and on Svea/Beda -l nodes=M:ppn=8.
An example:
qsub -l nodes=2:ppn=4 -l walltime=48:00:00 MPIscript
- -l nodes=2:ppn=4 request 8 CPUs over two machine
- -l walltime=48:00:00 tells the scheduler that you want the resource walltime to be 48 hours. See Resources below for more information.
- MPIscript is the name of the job script.
A simple MPI job script:
# Arguments to qsub can be submitted via the script as well by starting # the line with #PBS # #PBS -N MPI_test # # End of arguments to qsub #Load mpi-module (for example mpich in this case) module load mpich #Time the program time mpiexec ./mpi_job #End of script (make sure line before this gets run)
Unless you are using OpenMPI 1.3 or later, do NOT use mpirun to start your mpi programs, mpiexec shold always be used instead!
NOTE! that it's not possible to set environment variables for the MPI program in the job script since the script is only used to start mpirun. Environment variables needed by the MPI program has to be set in the login script. (.cshrc if you're using tcsh, or .bashrc if your using bash)
Information about the mpiexec launcher
The launcher program mpiexec mentioned above is a special MPI-launcher which (besides knowing how to start an MPI-program built with mpich or mpich2) also knows about Torque and the PBS-environment. When invoked, mpiexec will automatically infer the number of MPI-ranks and which nodes to use by talking to the resource manager (Torque); hence, there is no need to explicitly pass along any such information to mpiexec (i.e. no options like --np or --machinefile are needed).
Now, the Torque- and mpich1&2-aware version of mpiexec does unfortunately not work with MPI-applications built with OpenMPI (however, the default OpenMPI MPI-launcher already knows about Torque and therefore doesn't need any other launcher), so we have made mpiexec to only be available when an mpich- or mpich2-module is loaded. I.e., mpiexec is now logically affiliated with a corresponding MPI-module, and does not exist "by default".
The consequence of making the availability of mpiexec depend on the wheather an MPI-module is loaded is that one must now load an MPI-module before launching an MPI-application.
Commercial parallel applications
All commercial (as well as other binary-only) applications need to be modified to work with the environment at hand, please contact support for these cases.
Queues
There are two queues on Ada/Kal cluster, ada and kal. On Svea and Beda, there are several queues, the svea and beda queues may be used by C3SE users that have been granted access to Svea or Beda, the other queues are private queues for specific research groups.
You may need to manually specify the queue with e.g.
-q ada
to qsub or in the job script.
Projects
All jobs are accounted to a project and run in a specific queue . Most users are however only member of a single project/queue pair and the qsub-wrapper will in these cases automatically choose the proper project/queue combination for you. (Actually, the qsub-wrapper will automatically choose the correct values for -q and/or -A as soon there are no ambiguities.)
If you need to specify project, this is done using the -A flag. E.g. if you are member of te projct SNIC001-2-3 you use
-A SNIC001-2-3
as argument to qsub or in you job-script.
Resources
You can request certain resources when you submit your job. To do that simply use the following argument to qsub:
-l resource=value
There are two resources that should always be specified.
- -l walltime=<hours:minutes:seconds>
- -l nodes=<M>:ppn=<N>
walltime is the running time that your job will need to complete. If the job should run longer than this it will be killed.
The queues have an upper limit of 168:00:00, i.e. a full week. Should you request a longer running time than this, your job will simply be queued forever.
The ada queue have an upper limit of 16 nodes, i.e. -lnodes=16:ppn=4.
To specify more than 1 node, use the syntax:
-l nodes=N:ppn=M
here
1 < M < no of cores in node if N = 1 M = no of cores in node if M > 1
Memory limits
To make sure jobs running on the same node doesn't influence each other, we enforce memory limits for such jobs. Implemented rules:
- If you use all cores on one or more nodes, no limits are enforced.
- If you use less than the available number of cores on the node, the following holds:
- The default allocation (if you specify nothing) is: available_memory_per_core * (number_of_cores_in_your_job - 0.5)
- You may use at most: available_memory_per_core * (number_of_cores_in_your_job + 0.5)
Example: For a job on Ada (986Mb available physical memory / core) using 3 cores:
- Default allocation: 986Mb * (3 - 0.5) = 2456Mb
- Maximum allocation: 986MB * (3 + 0.5) = 3451Mb
To specify the amount of memory you need, use the following syntax (as argument to qsub or in your job-script):
-l mem=984mb
Note! You can of course specify less than the default!
If these limits are exceeded your code may fail with memory allocation problems or will be killed by the queueing-system.
Non re-runable jobs
To make sure that PBS wont try to restart your job in case of a system failure etc. one can either submit the job with the following argument
qsub -r n
or add the following to your job script
#PBS -r n
The default setting in PBS is that all jobs are rerunable.
Priorities for queuing jobs
When there are jobs queuing the priorities of the queued jobs are calculated based on a number of factors, listed here in order of importance:
- How much (compared to its share) have your account-group been running for the last 23 days
- How much have your division been running for the last 23 days
- How much have you been running for the last 23 days
- How long have this job been waiting in queue
- What is the relationship between runtime/waittime for this job
Based on these factors the scheduler (Maui) calculates a priority number for each job and when there are resources available to start a new job, the one with the highest number is chosen. To look at the current priorities, use
diagnose -p
to look at the history-data, use
diagnose -f
The exact factors for the different items above differs between systems.