Skip to content

Slurm and Singularity

Singularity

Singularity is a container manager commonly used within scientific HPCs. Below are some commonly used commands, mainly the run and the build command.

Run

singularity run --nv \  # nvidia
    -B /local/path:/container/path \  # map volume folders
    /path/to/singularity-image.sif \  # container image.sif
    sh /container/path/run.sh  # bash command (sh, python, etc)

Build

There is a script that will build the .sif image file for you, call it as follows:

sbatch /share_zeta/UnderTheSea/singularities/build_sif.srm <sif_name> <docker_hub_name>

This command will run singularity build $SIF_NAME.sif docker://$HUB_NAME, building a singularity image from docker hub, and saving it in the current directory. For more options and instructions, check the build_a_container guide.

Slurm

But you can't run the singularity command above directly in the cluster, you must use Slurm to schedule a job for you. This is the reason why we won't call singularity build directly, for example, and why we must envelop it in a .srm file and call it using slurm's sbatch command.

sbatch

This command will schedule a job to be run, and accepts options directly in the command call, or from within a special file with the .srm extension. This files will have a header with the sbatch options, and the script or command to be run. Check the example below.

#!/bin/sh
#SBATCH --nodes=1
#SBATCH --partition=gpu
#SBATCH --ntasks=1
#SBATCH --mem=7500
#SBATCH --gres=gpu:1
#SBATCH --oversubscribe
#SBATCH --ntasks-per-node=1
#SBATCH -J waifu_inference.srm
#SBATCH -o outputs/inf_output.txt

WAIFU_ROOT=/share_zeta/UnderTheSea/waifu2x

singularity run --nv \
        -B /share_zeta/UnderTheSea/data/test/ciab-23/out_stream:/input_dir \
        -B share_zeta/UnderTheSea/data/results/waifu2x/out_stream_noise_scale4x:/output_dir \
        -B $WAIFU_ROOT/nunif:/root/nunif \
        -B $WAIFU_ROOT/scripts:/scripts \
        /share_zeta/UnderTheSea/singularities/nunif.sif \
        sh /scripts/entrypoints/run_waifu_inference.sh
  • -N, --nodes: Request a minimum and maximum number of nodes to be allocated to the job. If only one number is specified, this is used as both the minimum and maximum node count.
  • -p, --partition: Request a specific partition for the resource allocation. If not specified, the default partition designated by the system administrator will be selected.
  • -n, --ntasks: sbatch does not launch tasks, it requests an allocation of resources and submits a batch script. This option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources.
  • --mem: Specify the real memory required per node. Default units are megabytes. Different units can be specified using the suffix [K|M|G|T].
  • --gres: Specifies a comma-delimited list of generic consumable resources. The format for each entry in the list is "name[[:type]:count]". The name is the type of consumable resource (e.g. gpu). The type is an optional classification for the resource (e.g. a100). The count is the number of those resources with a default value of 1.
  • -s, --oversubscribe: The job allocation can over-subscribe resources with other running jobs. The resources to be over-subscribed can be nodes, sockets, cores, and/or hyperthreads depending upon configuration. This option may result in the allocation being granted sooner than if the --oversubscribe option was not set and allow higher system utilization, but application performance will likely suffer due to competition for resources.
  • -ntasks-per-node: Request that ntasks be invoked on each node. If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per node.
  • -J, --job-name: Specify a name for the job allocation. The specified name will appear along with the job id number when querying running jobs on the system.
  • -o, --output: Instruct Slurm to connect the batch script's standard output directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to the same file.

Auxiliary commands

squeue

Shows information about jobs located in the Slurm scheduling queue.

 JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
 17713       cpu proxy_vs   felipe  R    1:33:25      1 cpunode-2-0
 17705       gpu tornado-   heitor  R   18:50:56      1 gpunode-1-7
 17712       gpu deepoil_  abdigal  R   15:26:23      1 gpunode-1-4
 17727       gpu waifu_in lucasces  R       0:01      1 gpunode-1-0

Add the --me option to display only your jobs.

$ squeue --me
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
17727       gpu waifu_in lucasces  R       1:09      1 gpunode-1-0

scancel

This command is used to signal or cancel jobs.

scancel [OPTIONS...] job_id

Add the --me option to restrict the cancelling to your user jobs.

sinfo

Shows information about partitions, its availability, job's time limit, number of nodes, statuses, and node list.

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
gpu*         up 3-00:00:00      2    mix gpunode-1-[4,7]
gpu*         up 3-00:00:00      5   idle gpunode-1-[0-1,3,5-6]
cpu          up 3-00:00:00      1    mix cpunode-2-0

scontrol show node

Shows information about all available nodes, including its CPU, GPU name, partition, available features, node address and more.

...
NodeName=gpunode-1-7 Arch=x86_64 CoresPerSocket=6 
   CPUAlloc=2 CPUEfctv=24 CPUTot=24 CPULoad=0.00
   AvailableFeatures=gpu
   ActiveFeatures=gpu
   Gres=gpu:gtx1080:2
   NodeAddr=gpunode-1-7 NodeHostName=gpunode-1-7 Version=23.02.5
   OS=Linux 5.15.0-125-generic #135-Ubuntu SMP Fri Sep 27 13:53:58 UTC 2024 
   RealMemory=22000 AllocMem=6000 FreeMem=22516 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=gpu 
   BootTime=2024-12-02T16:14:18 SlurmdStartTime=2024-12-02T16:15:00
   LastBusyTime=2024-12-02T16:15:00 ResumeAfterTime=None
   CfgTRES=cpu=24,mem=22000M,billing=24,gres/gpu=4
   AllocTRES=cpu=2,mem=6000M,gres/gpu=2
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s