From HPC Wiki
Jump to: navigation, search

This page provides information on the scheduler used for NJIT's HPC clusters.


About SGE

"SGE" stands for Son of Grid Engine, which is a an opoensource scheduler for HPC clusters. SGE is the successor to Sun Grid Engine, a project that was dropped by Oracle after its purchase Sun Mocrosystems. SGE

Using SGE

SGE can be used for both serial and parallel scheduling.

The process of submitting jobs to SGE is done using the SGE "qsub" command and a submit script :

qsub some.submit.script

Submit Script Example : Serial

Submit Script Example : Parallel

After running the qsub command, users will see a message similar to :

Your job 132 ("IMB-MPI1") has been submitted

"132" is the SGE job number and "IMB-MPI1" is the name of the job that is being submitted to the job queue.

SGE info, including example scripts

Getting Queue Status

Use the SGE command "qstat" to get the status of jobs in the SGE queue.

  • qstat -g c
    Show activity for all queues
  • qstat -f
    Show summary information for all queues
  • qstat -f -u '*'
    Show jobs and queue information for all users
  • qstat -f -u '*' | grep ucid | sort -n
    Show jobs and sorted queue information for ucid
  • qstat -u ucid
    Show jobs and queue information for user "ucid"
  • /afs/cad/hpc/site/bin/qsummary
    Show expanded queue summary information
  • /afs/cad/hpc/site/bin/qmeminfo
    Show RAM information for queues
  • /var/tmp/run.viewstat.out
    Queue summary history

Detailed qstat usage :

man qstat

Getting Host Status

As of March 2015, works on kong only - SGE on stheno is too old.

  • qhost
  • qhost -j
    All jobs, split by host

userstat (uses output from qstat and qhost) :


Deleting Jobs

Use "qdel" to delete a running job :

qdel 132

The above command will print a message similar to the following :

ucid has registered the job 132 for deletion

Detailed qstat usage :

man qstat

Array Jobs

Useful for running a large number of jobs that use the same command.

Array Jobs