Faculty HPC System "Swirles"

We have a faculty HPC system, called Swirles, for developing and running computationally intensive tasks.

Account

In order to obtain an account on Swirles please email your request to help@maths.cam.ac.uk stating which research group you are in. It is presently recommended that you attach your public SSH key to your request (see below).

Access

After receiving information that the account has been activated you can log in to Swirles using SSH. Use the host name "swirles.maths.cam.ac.uk", which will automatically redirect you to one of the two head nodes.

Unlike Fawcett, Swirles requires the use of SSH keys even for first-time connections. If you haven't provided a key to register with your account when requesting it, you can obtain a relatively short-lived SSH certificate from one of the Maths Linux workstations - documentation TBA.

Please note that direct connections to Swirles are possible from computers connected to Maths main network and from ssh.maths.cam.ac.uk. For ways to configure access from other computers please look at this page.

Head-node etiquette

Running resource-intensive computations on the head nodes (i.e. where you find yourself having logged in to Swirles using SSH) is not allowed - they are the single entry point to Swirles for all of its users so overusing their resources is very much antisocial. Please use the job scheduler (see below) instead, it can run both batch and interactive jobs. The only exception to the above rule is short-lived interactive tasks such as compilation of software - if it's not expected to take more than 5-10 minutes and you do not leave it unattended, it's okay to run it on the head nodes.

Visual Studio Code users, please note that you are required to configure it appropriately in order not to overload the head nodes. See the section Software below for details.

Long-running CPU-intensive processes on the head node may be terminated with no advance warning, and repeat offenders may have their Swirles accounts suspended.

Hardware configuration

6 nodes with 2x AMD Genoa (EPYC 9654) 96-core CPUs and 1.5 TB of RAM each
4 nodes with 2x Intel Sapphire Rapids with HBM (Xeon CPU Max 9480) 56-core CPUs and 1 TB of RAM each
3 nodes with 4x Intel Ponte Vecchio GPUs (Data Center GPU Max 1550, each with 128 GB of memory; fully interconnected with Xe Link), 2x Intel Sapphire Rapids with HBM (Xeon CPU Max 9480) 56-core CPUs and 1 TB of RAM each
3 nodes with 4x Nvidia Ampere GPUs (A100-SXM, each with 40 GB of memory; fully interconnected with NVLink), 2x Intel Ice Lake (Xeon Gold 6342) 24-core CPUs and 2 TB of RAM each
1 node with 4x Nvidia Lovelace GPUs (L40S, each with 48 GB of memory; no NVLink), 2x AMD Genoa (EPYC 9224) 24-core CPUs and 1.5 TB of RAM
NDR InfiniBand interconnect
Approx. 150 TB of dedicated storage (not backed up)

Please note that the above are the raw hardware specifications and not all of them are available to our users. For example, all Swirles compute nodes have got a number of cores (6 on 192-core Genoa nodes, 4 everywhere else) reserved for system use, and other reservations might be in effect at various times.

Disk space

Home directories and data disks on Swirles are separate from the ones on other Maths systems. Home directories have relatively small quotas on them so they should not be used to keep big data generated by computing jobs. For this purpose every user should have access to at least one of the subdirectories of /cephfs/store/*

Note that on Swirles the command quota does not work. To see what the limit is query the extended attribute ceph.quota.max_bytes on the correct directory, e.g.

getfattr -n ceph.quota.max_bytes /cephfs/home/<your crsid>

to see your home quota.

Software

Most of the software is provided in the form of environment modules, for packages installed primarily using Spack. One can check the list of available modules with command:

module av

A module can be activated with the command

module load <modulename>

Other useful commands:

module
   (no arguments)              print usage instructions
   list                        print list of loaded modules
   whatis                      as above with brief descriptions
   unload <modulename>         deactivate a module
   purge                       deactivate all currently loaded modules

New modules can be requested by sending email to help@maths.cam.ac.uk. Note that the Swirles software stack is currently still under development and certain packages (e.g. Julia) might take a while to become available.

Intel toolkits

Release-level Spack packages for Intel toolkits tend be quite old and do not presently cover all available tools, therefore we have got our own installation of Intel oneAPI HPC Toolkit 2025.0. We have got modules available for this installation as well, note however that they are named differently from older versions provided by Spack. The easiest way to tell the two apart is that upstream Intel modules have names starting with intel/ whereas Spack modules have names starting with intel-oneapi-.

Python packages

Unlike Fawcett, on Swirles we do recommend the use of environment modules for Python packages as well. Run module av 'py-' to see what is available.

That said, we do provide Conda in case you need a package not available or difficult to install as a module; one example of such a package would as of March 2025 be TensorFlow with CUDA support. To activate Conda, run the command

module load miniforge3

Note that Conda environments sometimes interact badly with software installed with environment modules. It is possible to unload all loaded environment modules (yes, this does include miniforge3; you will have to load it again afterwards) with the command:

module purge

By default Conda on Swirles saves both the environments created by users and the downloaded packages in the subdirectory "Conda" of your space in /cephfs/store; see the file .condarc in your Swirles home directory for the exact paths. As both these directories can be quite large it is recommended to clean Conda caches regularly to avoid running out of space. The details can be found in Conda documentation:

https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#specify-environment-directories-envs-dirs

https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#specify-pkg-directories

Visual Studio Code

In its default configuration, Visual Studio Code spawns multiple copies of the JavaScript server Node.JS on remote hosts. On multi-user systems such as the Swirles head node this can, and has been observed to, quickly exhaust the available resources and render them virtually unusable. Therefore, users wishing to use Visual Studio Code to work on Swirles are now required to adjust their configuration as follows:

Hit the Extensions button (on the left toolbar, looks like building blocks)
Locate the extension "TypeScript and JavaScript Language Features"; searching for "@builtin TypeScript" ought to do it
Disable that extension
Reload VS Code

Job Scheduler

Swirles operates the Slurm workload manager for managing resources. If you are not familiar with Slurm, or workload managers / batch-queuing systems in general, you might want to have a look at this FAQ before proceeding.

Some useful commands:

spartition      - show global cluster information
sinfo       - show global cluster information
sview       - show global cluster information
scontrol show job <job_number> - examine the job with jobid nnnn
scontrol show node nodename - examine the node with name nodename
sbatch      - submits an executable script to the partitioning system
srun        - run a command either as a new job or within an existing job
scancel     - delete a job

Submitting jobs

To submit a job one needs first create a submission script. It is a shell script with special comment lines with prefix

#SBATCH

which provide instructions to queuing system about required resources. For example:

#!/bin/bash
#! Which partition (partition) should be used
#SBATCH -p ampere
#! Number of required nodes
#SBATCH -N 1
#! Number of MPI ranks running per node
#SBATCH --ntasks-per-node=2
#! Number of GPUs per node if required
#SBATCH --gres=gpu:2
#!How much wallclock time will be required (HH:MM:SS)
#SBATCH --time=02:00:00

srun a.out

To submit a script to the queuing system use command:

sbatch <scriptname>

Interactive jobs

It is possible to request an interactive job with command srun. For example:

srun --pty -p genoa -n 2 --time=02:00:00 bash

would reserve two cores in the partition "genoa" for two hours and run bash there.

Available partitions

The following partitions are available to all users:

genoa - AMD Genoa CPU nodes. This is presently the default partition.
spr - Intel Sapphire Rapids CPU nodes
ampere - Nvidia A100 GPU nodes
lovelace - Nvidia L40S GPU node
pvc - Intel Ponte Vecchio GPU nodes

The default wall time for all these partitions is 10 minutes, the maximum is 12 hours.

Furthermore, each of the aforementioned partitions has its "long" counterpart with the maximum wall time of 72 hours. Note that access to long partitions is solely at the discretion of the PI concerned and must be explicitly requested, furthermore in order to facilitate higher throughput of jobs and better utilisation of the system jobs in long partition can use no more then 25% of resources.

Notes and comments

Jobs submitted to any of the GPU partitions must request GPU resources, and Slurm should immediately reject jobs which do not.
MPI jobs should be launched the same way as non-MPI ones, i.e. with srun. In case of Intel MPI it provides for better integration with Slurm than using mpirun or mpiexec, and jobs linked against OpenMPI might downright refuse to start in Slurm jobs if one of the latter two commands is used. For details, see the Slurm MPI Guide.

Memory available to jobs

By default memory available to a job is in proportion to allocated number of cores, as follows:

16 GiB per core on Ampere and Lovelace nodes
8 GiB per core on Ponte Vecchio and Sapphire Rapids nodes
4 GiB per core on Genoa nodes

It possible to request more memory by using --mem=size[units] option, where units are K, M, G, or T (M is default).