skip to content
 

We have a faculty HPC systems, called fawcett, for developing and running computational intensive task.

Access

In order to get access to fawcett please send email with a request to help@maths.cam.ac.uk. After receiving information that the account has been activated you can login to fawcett using ssh. Please note that connection to fawcett are possible from Maths desktop machines and from ssh.maths.cam.ac.uk. It is also possible to connect to fawcett from computers connected to eduroam or UniOfCam networks if you have your ssh key added to your authorized keys on fawcett.

Hardware configuration

  1. 32x SkyLake 6154 3GHz 18 core processors shared memory node with 6TB of RAM
  2. 4 nodes with 2x SkyLake 6140 18 core 2.3GHz processors and 384GB of RAM each
  3. 2 nodes with 2 x NVidia Pascal P100 GPUs, 2x SkyLake 6140 18 core 2.3GHz processors and 384GB of RAM
  4. 24 nodes with 1x Intel Xeon Phi 7210 (KNL), 64 cores, 1.3GHz processor and 96GB of RAM
  5. Intel Omni-Path HPC interconnect
  6. 100TB of dedicated storage (not backuped)

Disk space

Home directories and data disks on fawcett are separate from the ones on other Maths system. Home directory have relatively small quotas on them so they should not be used to keep big data generated by computing jobs. For this purpose every user should have access to at least on of the subdirectories of /nfs/st01/hpc-*.

Software

Most of the software is provided in the form environment modules. One can check the list of available modules with command:

module av

A module can be loaded with command

module load <modulename>

Other useful commands:

module      
   (no arguments)              print usage instructions
   list                        print list of loaded modules
   whatis                      as above with brief descriptions
   unload <modulename>         remove a module
   purge                       remove all modules

Most useful modules:

gcc
openmpi/3.0.1/gcc-7.3.0-3qmiso3
openmpi/3.0.1/intel-18.0.3-4ptj5cp
python
intel/compilers

New modules can be requested by sending email to help@maths.cam.ac.uk.

Python modules

Names of python modules begin usually with prefix py- for python2 modules and py3- for python3 modules.

Queuing system

Fawcett operates the SLURM batch queueing system for managing resources. Some useful commands:

squeue       - show global cluster information
sinfo       - show global cluster information
sview       - show global cluster information
scontrol show job <job_number> - examine the job with jobid nnnn
scontrol show node nodename - examine the node with name nodename
sbatch      - submits an executable script to the queueing system
srun        - run a command either as a new job or within an existing job
scancel     - delete a job

Submitting jobs

To submit a job one needs first create a submission script. It is a shell script with special comment lines with prefix

#SBATCH

which provide instructions to queuing system about required resources. For example script:

#!/bin/bash
#! Which partition (queue) should be used
#SBATCH -p cosmosx
#! Number of required nodes
#SBATCH -N 1
#! Number of MPI ranks running per node
#SBATCH --ntasks-per-node=2
#! Number of GPUs per node if required
#SBATCH --gres=gpu:2
#!How much wallclock time will be required (HH:MM:SS)
#SBATCH --time=02:00:00

mpiexec a.out

Note, that it is usually not necessary to specify -np argument to command mpiexec. The correct value should be deduced by installed MPI implementations based on requested resources. To submit a script to the queuing system use command:

sbatch <scriptname>

Interactive jobs

It is possible to request an interactive job with command srun. For example:

srun --pty -p skylake -n 2 --time=02:00:00 bash

would reserve two cores in skylake partition for two hours and run bash there,

Available queues

The following queues are available:

  1. cosmosx - a shared memory node, although it is also possible to run MPI jobs on it
  2. skylake - two socket SkyLake nodes
  3. gpu - GPU nodes
  4. knl - KNL nodes
  5. knl-long -queu for longer jobs on KNL nodes
  6. skylake-long - queue for longer jobs on SkyLake nodes
  7. cosmosx-long - queue for longer jobs on shared memory nod

The maximum wall time is 12 hours for normal queues and 36 hours for long queues. Access to long queues is at the discretion of the PI concerned. In order to facilitate higher throughput of jobs and better utilisation of the system jobs in long queue can use no more then 25% of resources.