Batch Jobs and Job Scripting¶
Batch jobs are by far the most common type of job on Summit. Batch jobs are resource provisions that run applications on nodes away from the user and do not require supervision or interaction. Batch jobs are commonly used for applications that run for long periods of time or require little to no user input.
Batch jobs are created from a job script which provide resource requirements and commands for the job.
Even though it is possible to run jobs completely from the command line, it is often overly tedious and unorganized to do so. Instead, Research Computing recommends constructing a job script for your batch jobs. A job script is set of Linux commands paired with a set of resource requirements that can be passed to the Slurm job scheduler. Slurm will then generate a job according to the parameters set in the job script. Any commands that are included with the job script will be run within the job.
Running a Job Script¶
Running a job script can be done with the
Because job scripts specify the desired resources for your job, you won’t need to specify any resources on the command line. You can, however, overwrite or add any job parameter by providing the specific resource as a flag within
sbatch --partition=sgpu <your-job-script>
Running this command would force your job to run on the sgpu partition no matter what your job script specified.
Making a Job Script¶
Although Research Computing provides a variety of different sample scripts users can utilize when running their own jobs, knowing how to draft a job script can be quite handy if you need to debug any errors in your jobs or you need to make substantial changes to a script.
A job script looks something like this:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --time=00:10:00 #SBATCH --partition=shas-testing #SBATCH --output=sample-%j.out module purge module load intel module load mkl echo "== This is the scripting step! ==" sleep 30 ./executable.exe echo "== End of Job =="
Normally job scripts are divided into 3 primary parts: directives, loading software, and user scripting. Directives give the terminal and the Slurm daemon instructions on setting up the job. Loading software involves cleaning out the environment and loading specific pieces of software you need for your job. User scripting is simply the commands you wish to be executed in your job.
A directive is a comment that is included at the top of a job script that tells the shell information about the script.
The first directive, the shebang directive, is always on the first line of any script. The directive indicates which shell you want running commands in your job. Most users employ bash as their shell, so we will specify bash by typing:
The next directives that must be included with your job script are sbatch directives. These directives specify resource requirements to Slurm for a batch job. These directives must come after the shebang directive and before any commands are issued in the job script. Each directive contains a flag that requests a resource the job would need to complete execution. An sbatch directive is written as such:
For example if you wanted to request 2 nodes with an sbatch directive, you would write:
A list of some useful sbatch directives can be found here. A full list of commands can be found in Slurm’s documentation for sbatch.
Because jobs run on a different node than from where you ran, any shared software that is needed must be loaded via the job script. Software can be loaded in a job script just like it would be on the command line. First we will purge all software that may be left behind from your working environment on a compile node by running the command:
After this you can load whatever software you need by running the following command:
module load <software>
More information about software modules can be found here.
3. User Scripting¶
The last part of a job script is the actual user scripting that will execute when the job is executing. This part of the job script includes all user commands that are needed to set up and execute the desired task. Any Linux command can be utilized in this step. Scripting can range from highly complex loops iterating over thousands of files to a simple call to an executable. Below is an simple example of some user scripting:
echo "== This is the scripting step! ==" touch tempFile1.in touch tempFile2.in sleep 30 ./executable.exe tempfile1.in tempfile2.in echo "== End of Job =="
Job script to run a 5 minute long, 1 node, 1 core C++ Job:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --time=00:05:00 #SBATCH --partition=shas-testing #SBATCH --ntasks=1 #SBATCH --job-name=cpp-job #SBATCH --output=cpp-job.%j.out module purge module load gcc ./example_cpp.exe
Job script to run a 7 minute long, 1 node, 4 core C++ OpenMP Job:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --time=00:07:00 #SBATCH --partition=shas-testing #SBATCH --ntasks=4 #SBATCH --job-name=omp-cpp-job #SBATCH --output=omp-cpp-job.%j.out module purge module load gcc export OMP_NUM_THREADS=4 ./example_omp.exe
Job script to run a 10 minute long, 2 node, 24 core C++ MPI Job:
#!/bin/bash #SBATCH --nodes=2 #SBATCH --time=00:10:00 #SBATCH --partition=shas-testing #SBATCH --ntasks=24 #SBATCH --job-name=mpi-cpp-job #SBATCH --output=mpi-cpp-job.%j.out module purge module load intel module load impi mpirun -np 24 ./example_mpi.exe
sbatch command supports many optional flags. To review all the options, please visit the Slurm sbatch page. Below are a few flags you may want to consider when running your job via
|Allocations||Specify an allocation account if you have multiple||--account=account_no|
|Partitions||Specify a partition||--partition=partition_name|
|Sending email||Receive email at beginning or end of job completion||--mail-type=type|
|Email address||Email address to receive the email||--mail-user=user|
|Number of nodes||The number of nodes needed to run the job||--nodes=nodes|
|Number of tasks||The total number of cores needed to run the job||--ntasks=processes|
|Quality of service||Specify a QOS||--qos=qos|
|Wall time||The max. amount of time your job will run for||--time=wall time|
|Job Name||Name your job so you can identify in queue||--job-name=
Couldn’t find what you need? Provide feedback on these docs!