skip to content
 

We run a queuing system for computational jobs on new public Maths desktops as well as certain non-clustered Maths servers. The system runs under control of Slurm. It is best suited for serial jobs and parallel jobs that do not require fast interconnection (for example jobs that can be divided to many independent tasks). If you find that you don't have access to it please contact help@maths.cam.ac.uk.

If you are not familiar with Slurm, or workload managers / batch-queuing systems in general, you might want to have a look at this FAQ before proceeding.

Available resources

The default partition of the queueing system currently contains a little over 200 nodes. On every node there are at least 2 cores with 4 GB of memory per core available to computational jobs.

Furthermore, a dedicated partition called beehive provides access to the eponymous Maths Linux compute server, which can be used to run number-crunching jobs hich require more CPU or memory than is available on the desktops. At present Slurm jobs on Beehive can use up to 128 CPUs (i.e. 64 cores with 2 hardware threads each) and 256 GB of RAM. There is also 2.8TB of scratch data space at /local/scratch/public - but please note that it is NOT backed up.

Some useful commands

squeue      - show global cluster information
sinfo       - show global cluster information
sview       - show global cluster information
scontrol show job <job_number> - examine the job with jobid nnnn
scontrol show node nodename - examine the node with name nodename
sbatch      - submits an executable script to the queueing system
srun        - run a command either as a new job or within an existing job
scancel     - delete a job

Time limit for jobs

The time limit for job can be set usingĀ  --time to sbatch. The default value is 10 minutes and the maximum allowed value is 3 days. Once the specified limit has been reached the job get killed by queuing system.

Weekly maintenance reservation

There is a maintenance window for queuing sytem every Wednesday between midnight and 6:30am. During this period no jobs are able to run. For example if you submit a job on Tuesday at 6pm with time limit longer then 6 hours it will be scheduled earliest next morning after the maintenance period.

Submitting jobs

To submit a job one needs first create a submission script. It is a shell script with special comment lines with prefix

#SBATCH

which provide instructions to queuing system about required resources. For example script:

#!/bin/bash
#! Number of required nodes (can be omitted in most cases)
#SBATCH -N 1
#! Number of tasks
#SBATCH --ntasks-per-node=2
#! Number of cores per task (use for multithreaded jobs, by default 1)
#SBATCH -c 2
#!How much wallclock time will be required (HH:MM:SS)
#SBATCH --time=02:00:00