UsingKsub

From HPC Wiki
Jump to: navigation, search

In order to securely read and/or write files stored under /afs/cad/... it is necessary to have your Kerberos ticket. This ticket is automatically created when you log in to the Kong or Stheno headnode.

Currently (Sept 2016) the scheduler, via qsub, does not know how to pass your Kerberos ticket - and thus acquire your AFS token - to your job on compute nodes. As a workaround ARCS has provided "ksub", which unfortunately has these limitations:

  1. You must create a file named $HOME/.ksub_user, e.g. echo anything >$HOME/.ksub_user
  2. After ksub is enabled you can't use qsub until you remove the above file, e.g. rm $HOME/.ksub_user
  3. Tickets used by ksub expire 30 days after you logged into the cluster, regardless of when you submit your job. For example, if you're logged in for 20 days and submit a job, then that job's tickets will expire in 10 days (including waiting on queue and running). You can get a new 30 day ticket by running kinit before running ksub.
  4. The submit script for the job must be located under /home, not under /afs. The default output of the job will also be under the same /home directory. However, your job can read/write files under /afs or can even start with a "cd /afs/cad/some/directory". Please email us at ARCS@njit.edu for assistance with writing your submit scripts if you have trouble with this limitation of ksub.
  5. This procedure only works for serial jobs. Please email us at ARCS@njit.edu is your need to I/O on /afs/cad/ files in parallel environment jobs.
  6. Just before running ksub, be sure you have your Kerberos ticket and AFS token by :
    kinit && aklog

A proper fix for the scheduler running jobs on the compute nodes with the user's AFS token is anticipated by Janauary 2017.