RunningCUDASamplesOnKong

From HPC Wiki
Jump to: navigation, search

This tutorial demonstrates how to compile and run a GPU job using CUDA sample code.

Make a directory to hold the samples
kong-41 ~>: mkdir gpu
kong-42 ~>: cd gpu

Copy the sample files from AFS. Make sure to copy all of the files.
kong-43 gpu>: cp -r /afs/cad/linux/cuda/9.0.176/samples/ .

Go to the directory matrixMul
kong-44 gpu>: cd 0_Simple/matrixMul

Load the cuda module
kong-45 matrixMul>: module load cuda

Build the binary
kong-46 matrixMul>: make
"/afs/cad/linux/cuda/9.0.176"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o matrixMul.o -c matrixMul.cu "/afs/cad/linux/cuda/9.0.176"/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o matrixMul matrixMul.o mkdir -p ../../bin/x86_64/linux/release cp matrixMul ../../bin/x86_64/linux/release
Create a submit script

#!/bin/sh
#
# Usage: gputest.sh
# Change job name and email address as needed 
#        
 
# -- our name ---
#$ -N matrixMul
#$ -S /bin/sh
# Make sure that the .e and .o file arrive in the
#working directory
#$ -cwd
#Merge the standard out and standard error to one file
#$ -j y
# Send mail at submission and completion of script
#$ -m be
#$ -M UCID@njit.edu
# Specify GPU queue
#$ -q gpu
# Request one gpu (max two)
#$ -l gpu=1
/bin/echo Running on host: `hostname`.
/bin/echo In directory: `pwd`
/bin/echo Starting on: `date`
 
# Load CUDA module
. /opt/modules/init/bash
module load cuda
#Full path to executable
/home/g/UCID/gpu/samples/0_Simple/matrixMul/matrixMul

Submit the job
kong-47 matrixMul>: qsub gpusubmit.sh
Your job 390030 ("matrixMul") has been submitted

View the output
kong-48 matrixMul>: cat matrixMul.o390030
Running on host: node151.
In directory: /home/g/UCID/gpu/samples/0_Simple/matrixMul
Starting on: Wed Nov 5 14:46:48 EST 2014
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Tesla K20Xm" with compute capability 3.5
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 274.18 GFlop/s, Time= 0.478 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
Note: For peak performance, please refer to the matrixMulCUBLAS example.


Note : Users should not log in directly to a GPU node - use qsub or qlogin.

Users can debug their GPU code on any of oslN.njit.edu, N=1-30 and 51-84. These workstations have (very slow) GPUs and the needed drivers. All AFS accounts have logins on the oslN.