SGEToSLURM

From NJIT-ARCS HPC Wiki
Jump to: navigation, search

SGE to SLURM Migration Guide

SLURM Quick Start User Guide

On that page, see:
Tutorials
Documentation
FAQ


Man pages exist for all SLURM daemons, commands, and API functions.

Example: man squeue

The command option "--help" also provides a brief summary of options.

Example: squeue --help

Command options are all case sensitive..


Some common commands and flags in SGE and SLURM with their respective equivalents

SGESLURM
User Commands
Interactive loginqloginsrun --pty bash
Job submission qsub [script_file] sbatch [script_file]
Job deletion qdel [job_id] scancel [job_id]
Job status by job qstat -u \* [-j job_id] squeue [job_id]
Job status by user qstat [-u user_name]

squeue -u [user_name]

Job hold qhold [job_id] scontrol hold [job_id]
Job release qrls [job_id] scontrol release [job_id]
List enqueued jobs qconf -sql squeue
List nodes qhost sinfo -N OR scontrol show nodes
Cluster status qhost -q sinfo
Environmental
Job ID $JOB_ID $SLURM_JOBID
Submit directory $SGE_O_WORKDIR $SLURM_SUBMIT_DIR
Submit host $SGE_O_HOST $SLURM_SUBMIT_HOST
Node list $PE_HOSTFILE $SLURM_JOB_NODELIST
Job Array Index $SGE_TASK_ID $SLURM_ARRAY_TASK_ID
Job Specification
Script directive #$ #SBATCH
Queue (called partition in SLURM) -q [queue] -p [partition]
Count of nodes N/A -N [min[-max]]
CPU count -pe [PE] [count] -n [count]
Wall clock limit -l h_rt=[seconds] -t [min] OR -t [days-hh:mm:ss]
Standard out file -o [file_name] -o [file_name]
Standard error file -e [file_name] -e [file_name]
Combine STDOUT and STDERR files -j yes use "-o" without "-e"
Copy environment -V --export=[ALL | NONE | variables]
Event notification -m abe --mail-type=[events]
Send notification email -M [address] --mail-user=[address]
Job name -N [name] --job-name=[name]
Restart job -r [yes|no] --requeue OR --no-requeue (NOTE:
configurable default)
Set working directory -wd [directory] --workdir=[dir_name]
Resource sharing -l exclusive --exclusive OR--shared
Memory size -l mem_free=[memory][K|M|G] --mem=[mem][M|G|T] OR --mem-per-cpu=
[mem][M|G|T]
Charge to an account -A [account] --account=[account]
Tasks per node (Fixed allocation_rule in PE) --tasks-per-node=[count]
--cpus-per-task=[count]
Job dependancy -hold_jid [job_id | job_name] --depend=[state:job_id]
Job project -P [name] --wckey=[name]
Job host preference -q [queue]@[node] OR -q
[queue]@@[hostgroup]
--nodelist=[nodes] AND/OR --exclude=
[nodes]
Quality of service --qos=[name]
Job arrays -t [array_spec] --array=[array_spec] (Slurm version 2.6+)
Generic Resources -l [resource]=[value] --gres=[resource_spec]
Licenses -l [license]=[count] --licenses=[license_spec]
Begin Time -a [YYMMDDhhmm] --begin=YYYY-MM-DD[THH:MM[:SS]]


Detailed qstat and squeue comparison in SGE and SLURM

SGE SLURM
qstat squeue
qstat -u username squeue -J jobname
qstat -f squeue -al
qsub sbatch
qsub -N jobname sbatch -J jobname
qsub -q datasci sbatch -q datasci
qsub -m beas sbatch --mail-type=ALL
qsub -M ucid@njit.edu sbatch --mail-user=ucid@njit.edu
qsub -l h_rt=24:00:00 sbatch -t 24:00:00
qsub -pe dmp4 16 sbatch -p node -n 16
qsub -l mem=4G sbatch --mem=4G
qsub -P projectname sbatch -A projectname
qsub -o filename sbatch -o filename
qsub -e filename sbatch -e filename
qsub -l scratch_free=20G sbatch --tmp=20480

Comparison between job scripts in SGE and SLURM

SGE for a single-core application SLURM for a single-core application
#!/bin/bash
#
#$ -N testjobname
#$ -q datasci
#$ -j y
#$ -o test.output
#$ -cwd
#$ -M ucid@njit.edu
#$ -m bea
# Request 5 hours run time
#$ -l h_rt=5:0:0
#$ -P your_project_ID>
#
#$ -l mem=4G
# 
your_application
#!/bin/bash -l
# NOTE the -l (login) flag!
#
#SBATCH -J testjobname
#SBATCH -p datasci
#SBATCH -o test.output
#SBATCH -e test.output
# Default in slurm
#SBATCH --mail-user ucid@njit.edu
#SBATCH --mail-type=ALL
# Request 5 hours run time
#SBATCH -t 5:0:0
#SBATCH -A your_project_ID
#SBATCH --mem=4G
your_application
SGE for an MPI application SLURM for an MPI application
#!/bin/bash
#
#$ -N testjobname
#$ -q datasci
#$ -j y
#$ -o test.output
#$ -cwd
#$ -M ucid@njit.edu
#$ -m bea
# Request 5 hours run time
#$ -l h_rt=5:0:0
#$ -P your_project_id
#$ -R y
#$ -pe dmp4 16
#$ -l mem=2G
# memory is counted per process on node
module load module1 module2 ...
mpirun your_application
#!/bin/bash -l
# NOTE the -l (login) flag!
#
#SBATCH -J testjobname
#SBATCH -p datasci
#SBATCH -o test.output
#SBATCH -e test.output
# Default in slurm
#SBATCH --mail-user ucid@njit.edu
#SBATCH --mail-type=ALL
# Request 5 hours run time
#SBATCH -t 5:0:0
#SBATCH -A your_project_id
#
#SBATCH --mem-per-cpu 2G
#SBATCH -n 16
#
module load module1 module2 ...
mpirun your_application

Comparison of some parallel environments set by SGE and SLURM

SGESLURM
$JOB_ID$SLURM_JOB_ID
$NSLOTS$SLURM_NPROCS