KongQueues

From NJIT-ARCS HPC Wiki
Jump to: navigation, search

The Kong queues were re-structured in midmay 2015. The new structure introduces three queues which segregate jobs by anticipated wall clock time (as opposed to CPU time) limits. Jobs that run past their wall clock limits are automatically terminated. Each queue will be allocated an initial number of nodes, but the node counts will be adjusted depending on demand, possibly even dynamically.

The new queue structure is as follows.

short

All users have access to the "short" queue. This queue is the default queue : if no queue is specified in the job submission script, the job will run in this queue. This queue has a 48 hour wall time limit : jobs running in this queue will terminate after 48 hours.

medium

All users have access to the "medium" queue. This queue has a 168 hour (7 days) wall time limit : jobs running in this queue will terminate after 168 hours. Users must specify this queue with "-q medium" on a qsub or qlogin command, or in the qsub submit script by including the following line:

#$ -q medium

long

The "long" queue has no wall time limit.

Jobs running on this queue have no wall clock limit, but are impacted by the monthly maintenance cycle.

If a user wishes to run on this queue a request must be sent to arcs@njit.edu.

Users must specify this queue with "-q long" on a qsub or qlogin command, or in the qsub submit script by including the following line:

#$ -q long

Note A queue can be specified within matlab, e.g. :

ClusterInfo.setQueue ('medium')

gpu

This queue has a 168 hour (7 days) wall time limit : jobs running in this queue will terminate after 168 hours.

Two of Kong's nodes contain twin GPUs and 20 CPU cores each. These nodes are in contention for both GPU and SMP jobs, so we are still observing their usage in order to devise a fair use policy. Currently:

  • Anyone may use this queue.
  • You can run jobs on two of the four GPUs simultaneously (intended for GPU jobs).
  • You are limited to 20 CPU cores simultaneously (intended for SMP jobs).

Please refer to this page for updates on gpu queue policy.

Please refer to Running CUDA Samples on Kong for examples of how to specify the GPU queue and number of GPUs desired.

smp

All users have access to the Symmetric Multiprocessing (SMP) queue. This queue has a 168 hour (7 days) wall time limit: Jobs running in this queue will terminate after 168 hours. Users must specify this queue with "-q smp" on a qsub or qlogin command, or in the qsub submit script by including the following line:

#$ -q smp

There is one SMP node, with eight 4-core processors (AMD Opteron 8384), for a total of 32 cores, and 128GB of RAM.