Scaling Up with Job Arrays

Scaling Up with Job Arrays#

Warning

This page assumes you are already familiar with the basics of writing and submitting SLURM batch job scripts. If you are new to batch jobs, please review our documentation on batch jobs before continuing.

Job arrays in SLURM simplify the running of multiple similar jobs. Instead of creating and submitting a separate batch script for each job, you use a single template script. SLURM then automatically generates and schedules each job, or task, in the array, saving you time and effort. Job arrays are best suited for workflows where the core computational task is consistent, but the input data and/or parameters vary for each run.

Common job array scenarios include:

Processing multiple data files:
Running the same analysis script on a large dataset where each file is processed as a separate, independent job.
Parameter sweeping:
Executing simulations or models with a wide range of input parameters to explore a parameter space.
Monte Carlo simulations:
Performing thousands or millions of independent trials to estimate a numerical result.
High-throughput computing:
Launching a large number of short, non-communicating jobs that can be run in parallel.

Note

In the context of a job array, SLURM refers to the individual “jobs” within the array as “tasks”. To avoid any confusion, we will use that same language here.

Elements of a Job Array#

To convert a standard batch script into a job array, you will need to make three changes:

Include the --array directive
Modify the --output directive to include %A and %a
Update your code to utilize the Job Array Environment Variables

Here’s a quick example that shows how all three (--array, --output, and the environment variables) work together. You’ll find more details and examples in the sections below.

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --partition=atesting
#SBATCH --qos=testing
#SBATCH --array=1-3                # 1 - Set Array Indexes
#SBATCH --output=example-%A-%a.out # 2 - Add the Job ID and Task ID for each array task

module purge

# 3 - If necessary, update your script to use the Task ID. 
#     This ensures each task will run its assigned workflow

echo "This task's index is $SLURM_ARRAY_TASK_ID"
echo "This job array has $SLURM_ARRAY_TASK_COUNT tasks"
echo "The array's Job ID is $SLURM_ARRAY_JOB_ID"
echo "The task's Job ID is $SLURM_JOB_ID"
echo "The combined array Job ID and Task ID is ${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}"

Note

In order for SLURM to treat a batch script as a Job Array, it must include the --array directive. This can be added directly to the script, like in the example above, or added as a command-line argument when submitting the job (e.g., sbatch --array=1-3 example.sh).

Also, please be aware that a job array can only be created from a batch script. You cannot use the --array directive with interactive sessions.

Important

The tasks in a job array are not guaranteed to run simultaneously or in any specific order. Therefore, you must write your batch script so that each task can run independently. If your tasks must execute in a particular sequence, you may need to include the --dependency directive.

Further information on the --dependency directive can be found in SLURM’s documentation.

Task Indexes#

The --array directive provides three methods for defining the job array’s task indexes:

a range with a default step size of 1
a range with a specific step size
or a list of specific index values

An example of each of these is provided below.

Default Step Size

Request 3 Tasks with Indexes 1, 2, and 3

#SBATCH --array=<START>-<STOP> 
#SBATCH --array=1-3

Specific Step Size

Request 4 Tasks with Indexes 3, 6, 9, and 12

#SBATCH --array=<START>-<STOP>:<STEP>
#SBATCH --array=3-12:3 

List of Indexes

Request 5 Tasks with Indexes 10, 45, 83, 96, and 103

#SBATCH --array=<INDEX 1>,<INDEX 2>,<INDEX N>
#SBATCH --array=10,45,83,96,103

Limiting Concurrent Tasks

You can limit the number of tasks that can run concurrently, i.e., at the same time, by adding the % modifier. This can be necessary when submitting job arrays that request a large number of tasks and/or a large number of resources per task. Without the % modifier, all of the tasks will try to run at once, which might cause your RC account to hit the system’s resource limits and prevent you from running other jobs on a given partition and/or the cluster itself.

To see the current resource limits for jobs submitted to the Alpine cluster, please check the Quality-of-Service table and the Partitions table.

#SBATCH --array=<INDEXES>%<# OF CONCURRENT JOBS> 

# Limit to 2 concurrent jobs
#SBATCH --array=1-10%2 

Important

Currently, a single job array is limited to, at most, 1,000 tasks. This limit is in place to protect the SLURM controller and ensure it isn’t overwhelmed trying to generate and track too many tasks.

Please, be careful that you do not submit a large number of concurrent job arrays; this too can overwhelm the SLURM controller. To protect the stability of the cluster, Research Computing reserves the right to suspend and/or cancel any jobs that overwhelm the system.

If you have questions about submitting large job arrays and/or multiple job arrays, please submit a support request form.

Output Filenames#

To differentiate the output files generated by each task, it is important to include %A (SLURM_ARRAY_JOB_ID) and %a (SLURM_ARRAY_TASK_ID) in the filename. These can be included in both the --output and --error SBATCH directives, like so:

#SBATCH --output=example-%A-%a.out
#SBATCH --error=example-%A-%a.err

Environment Variables#

SLURM provides a set of environment variables specific to job arrays. Those variables are listed and defined in the table below.

Variable	Description
SLURM_JOB_ID	The unique Job ID of a task in the array
SLURM_ARRAY_JOB_ID	The primary Job ID shared by all tasks in the array
SLURM_ARRAY_TASK_ID	The specific index of a task in the array
SLURM_ARRAY_TASK_COUNT	The number of tasks within the array
SLURM_ARRAY_TASK_MAX	The highest index in the array
SLURM_ARRAY_TASK_MIN	The lowest index in the array

As an example, if a job array was submitted with --array=1-3, it would result in a set of three tasks with values similar to these:

# 1
SLURM_JOB_ID=507
SLURM_ARRAY_JOB_ID=505
SLURM_ARRAY_TASK_ID=1
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1

# 2
SLURM_JOB_ID=506
SLURM_ARRAY_JOB_ID=505
SLURM_ARRAY_TASK_ID=2
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1

# 3
SLURM_JOB_ID=505
SLURM_ARRAY_JOB_ID=505
SLURM_ARRAY_TASK_ID=3
SLURM_ARRAY_TASK_COUNT=3
SLURM_ARRAY_TASK_MAX=3
SLURM_ARRAY_TASK_MIN=1

Canceling Job Arrays and Tasks#

You can use scancel to cancel all of the tasks or a specific set of tasks in a job array.

One Task

Cancel an individual task

scancel <SLURM_ARRAY_JOB_ID>_<SLURM_ARRAY_TASK_ID>
scancel 505_3

Subset of Tasks

Cancel a subset of tasks

scancel <SLURM_ARRAY_JOB_ID>_[<SLURM_ARRAY_TASK_ID>-<SLURM_ARRAY_TASK_ID>]
scancel 505_[2-3]

All Tasks

Cancel all tasks

scancel <SLURM_ARRAY_JOB_ID>
scancel 505

Example Batch Scripts#

Assigning Data to Tasks

This example shows you how to organize your data so it’s “job array friendly” by giving each task its own data file to process. One way to do this is to divide your data into a set of smaller files and name each file with a specific task id.

Here’s how it works:

Create a job array batch script.
Name each of the data files with a task id (e.g., DATA_1.csv, DATA_2.csv, and DATA_3.csv).
In the batch script, use the SLURM_ARRAY_TASK_ID so that each task only reads its assigned file.

#!/bin/bash
#SBATCH --time=00:00:10
#SBATCH --partition=amilan
#SBATCH --qos=normal
#SBATCH --nodes=1 
#SBATCH --ntasks=1 
#SBATCH --job-name=Array_Example_Multiple_Files
#SBATCH --output=files.%A_%a.out
#SBATCH --array=1-3

module purge

# Prints the contents of the assigned data file to the standard output.
cat "DATA_${SLURM_ARRAY_TASK_ID}.csv"

Assigning Parameters to Tasks

This example shows how to assign unique parameters to each task in a job array. The parameters are stored in a text file called parameters.txt.

Here’s how it works:

A batch script creates a job array with five tasks.
Each task uses the sed command to read its specific parameters from parameters.txt.
These parameters are then passed to a Python script, cars_mpg.py, which prints them out.

You can use this method with any program that takes command-line arguments, not just Python. For more on the sed command, review its help page by running sed --help.

Batch Script:

#!/bin/bash
#SBATCH --time=00:00:10
#SBATCH --partition=amilan
#SBATCH --qos=normal
#SBATCH --nodes=1 
#SBATCH --ntasks=1 
#SBATCH --job-name=cars
#SBATCH --output=cars.%A_%a.out
#SBATCH --array=1-5

module purge
module load miniforge

# Note - the python script assumes two arguments (car name and mpg) 
#        are listed on each line of the parameters.txt file
python cars_mpg.py $(sed -n "${SLURM_ARRAY_TASK_ID}p" parameters.txt)

Python Script:

import sys

car=sys.argv[1]
mpg=sys.argv[2]

print("The " + car + " gets " + mpg + " mpg.")

Input File (parameters.txt):

mustang 25
pinto 30
chevette 33
nova 21
cutlass 23