Jump to: navigation, search

Kong to Lochness or Stheno Migration Quick Start Guide (SGE to SLURM)

Cluster jobs can be run interactive or batched. Many researchers use interactive access to learn how to run the research in batches, then develop a batch process, so the research computation can be run many times by batch without the researcher needing to interact with the cluster.

A simple command line interface login onto a compute node could be:

srun -p datasci -n 1 --mem=4G  --pty bash

To use GPUs interactively use the following

srun -p datasci -n 1 --mem=4G --gres=gpu:1 --pty bash

This will request 1 node with 4G memory and 1 GPU. If you want to use multiple cores use the following

srun -p datasci -n 1 --ntasks-per-node=32 --mem=4G --gres=gpu:1 --pty bash

The "datasci" part of the above refers to what SLURM calls a "partition". On Kong this was called a "queue". We will use the SLURM nomenclature moving forward, but for all of our convenience we used the same queue (now partition) names and the same node names when we moved the hardware to Lochness and Stheno[1].

The SLURM scheduler on Lochness and Stheno is able to limit access to resources, so 4GB of RAM would not be available for other processes even if your login is using only a small fraction of it. If you do not specify the RAM in either srun or sbatch, you'll get the default of 1GB.

For "qsub" batch jobs (on Kong) the corresponding SLURM command is "sbatch". SLURM needs exactly the same information as SGE did, but the format is different. The "Job specification" section of SGE To SLURM describes these.

For example, in SGE/qsub you might have:

# The above line must always be first
#$ -N curecancer
#$ -q datasci

The SLURM/sbatch versions are:

#!/bin/bash -l
# The above line must always be first, and must have "-l"
#SBATCH -J curecancer
#SBATCH -p datasci

Apart from the first line and the "#$" to "#SBATCH" lines, mostly everything else is the same (exceptions might be module names).

If you're having trouble with this, we'd like to help. Please pick out a qsub of your own that you had success with previously, copy it and make the SLURM changes. If it doesn't work just send us the file name and the output file name (if one) and we'd be happy to help you.
Schedule an appointment to get help

[1] All of Stheno was moved to SLURM. The Kong short, medium, and long queues, and a few others, were not moved. Details