SonOfGridEngine

This page provides information on the scheduler used for NJIT's HPC clusters.
Contents
About SGE
"SGE" stands for Son of Grid Engine, which is a an opoensource scheduler for HPC clusters. SGE is the successor to Sun Grid Engine, a project that was dropped by Oracle after its purchase Sun Mocrosystems. SGE
Using SGE
SGE can be used for both serial and parallel scheduling.
The process of submitting jobs to SGE is done using the SGE "qsub" command and a submit script :
qsub some.submit.script
Submit Script Example : Serial
Submit Script Example : Parallel
After running the qsub command, users will see a message similar to :
Your job 132 ("IMB-MPI1") has been submitted
"132" is the SGE job number and "IMB-MPI1" is the name of the job that is being submitted to the job queue.
SGE info, including example scripts
Getting Queue Status
Use the SGE command "qstat" to get the status of jobs in the SGE queue.
- qstat -g c
Show activity for all queues - qstat -f
Show summary information for all queues - qstat -f -u '*'
Show jobs and queue information for all users - qstat -f -u '*' | grep ucid | sort -n
Show jobs and sorted queue information for ucid - qstat -u ucid
Show jobs and queue information for user "ucid" - /afs/cad/hpc/site/bin/qsummary
Show expanded queue summary information - /afs/cad/hpc/site/bin/qmeminfo
Show RAM information for queues - /var/tmp/run.viewstat.out
Queue summary history
Detailed qstat usage :
man qstat
Getting Host Status
As of March 2015, works on kong only - SGE on stheno is too old.
- qhost
- qhost -j
All jobs, split by host
userstat (uses output from qstat and qhost) :
userstat
Deleting Jobs
Use "qdel" to delete a running job :
qdel 132
The above command will print a message similar to the following :
ucid has registered the job 132 for deletion
Detailed qstat usage :
man qstat
Array Jobs
Useful for running a large number of jobs that use the same command.