-
This site is deprecated and will be decommissioned shortly. For current information regarding HPC visit our new site: hpc.njit.edu
Difference between revisions of "MinicondaUserMaintainedEnvs"
Line 275: | Line 275: | ||
This will generate an output file <code>output.log</code>. Now open the log file and copy the URL. The URL will be in the following format | This will generate an output file <code>output.log</code>. Now open the log file and copy the URL. The URL will be in the following format | ||
<pre>http://localhost:${port}/?token=XXXXXXXX</pre> | <pre>http://localhost:${port}/?token=XXXXXXXX</pre> | ||
+ | |||
+ | To kill the Jupyter Notebook process, you need to use the following command first to see the currently running processes. | ||
+ | |||
+ | <pre> | ||
+ | login-1-106 ~ >: top -u guest | ||
+ | </pre> | ||
+ | Replace <code>guest</code> with NJIT UCID. Once you execute the command, you will see the output something like the following | ||
+ | <pre> | ||
+ | PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND | ||
+ | 20653 guest 20 0 33132 1440 1072 S 0.0 0.0 0:00.04 krenew | ||
+ | 20654 guest 20 0 113284 1216 1040 S 0.0 0.0 0:00.00 bash | ||
+ | 20655 guest 20 0 113288 1624 1368 S 0.0 0.0 0:00.00 jupyter.sh | ||
+ | 20693 guest 20 0 482688 89112 13024 S 0.0 0.0 1:23.88 jupyter-noteboo | ||
+ | 21752 guest 20 0 862064 56588 9084 S 0.0 0.0 0:33.90 python | ||
+ | 21772 guest 20 0 126384 2164 1684 S 0.0 0.0 0:00.00 bash | ||
+ | 26251 guest 20 0 184632 2504 1116 S 0.0 0.0 0:00.00 sshd | ||
+ | 26252 guest 20 0 126252 2100 1636 S 0.0 0.0 0:00.00 bash | ||
+ | 26294 guest 20 0 172940 2524 1648 R 0.0 0.0 0:00.14 top | ||
+ | </pre> | ||
+ | |||
+ | Identify the process ID (PID) responsible for running Jupyter Notebook. In this above output, the PID is 20693. To kill the process, use | ||
+ | <pre> | ||
+ | login-1-106 ~ >: kill -9 20693 | ||
+ | </pre> |
Revision as of 17:43, 22 August 2023
Miniconda is an easy to install, minimal python distribution. Users can use miniconda to create virtual python environments to manage python modules. The instructions that follow are for linux.
Installation
Download miniconda
login-1-95 ~ >: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh --2020-07-29 16:24:59-- https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.131.3, 104.16.130.3, 2606:4700::6810:8203, ... Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.131.3|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 93052469 (89M) [application/x-sh] Saving to: ‘Miniconda3-latest-Linux-x86_64.sh’ 100%[========================================================================================>] 93,052,469 31.6MB/s in 2.8s 2020-07-29 16:25:03 (31.6 MB/s) - ‘Miniconda3-latest-Linux-x86_64.sh’ saved [93052469/93052469]
Run the installation
login-1-96 ~ >: chmod +x Miniconda3-latest-Linux-x86_64.sh login-1-96 ~ >: ./Miniconda3-latest-Linux-x86_64.sh
Accept the license and the default location. After python and some packages are installed you will be prompted to run conda init. Enter 'yes' at the prompt.
When the installation is complete the following appears:
> For changes to take effect, close and re-open your current shell. < If you'd prefer that conda's base environment not be activated on startup, set the auto_activate_base parameter to false: conda config --set auto_activate_base false Thank you for installing Miniconda3!
Since you will likely be maintaining your own virtual environments, it is recommended not to activate the base environment on startup.
login-1-101 ~ >: conda config --set auto_activate_base false
Log off and log in again.
Create and Activate a Conda Virtual Environment
The following example will create a new conda environment based on python 3.7 and install tensorflow in the environment.
login-1-105 ~ >: conda create --name tf python=3.7 Collecting package metadata (current_repodata.json): done Solving environment: done ## Package Plan ## environment location: /home/g/guest24/miniconda3/envs/tf added / updated specs: - python=3.7 The following packages will be downloaded: <output snipped> Proceed ([y]/n)?y <output snipped> # # To activate this environment, use # # $ conda activate tf # # To deactivate an active environment, use # # $ conda deactivate
Activate the new 'tf' environment
login-1-106 ~ >: conda activate tf (tf) login-1-107 ~ >:
Install tensorflow-gpu
(tf) login-1-107 ~ >: conda install -c anaconda tensorflow-gpu Collecting package metadata (current_repodata.json): done Solving environment: done ## Package Plan ## environment location: /home/g/guest24/miniconda3/envs/tf added / updated specs: - tensorflow-gpu <output snipped> The following packages will be SUPERSEDED by a higher-priority channel: ca-certificates pkgs/main --> anaconda certifi pkgs/main --> anaconda openssl pkgs/main --> anaconda Proceed ([y]/n)?y <output snipped> mkl_fft-1.1.0 | 143 KB | ####################################################################################### | 100% urllib3-1.25.9 | 98 KB | ####################################################################################### | 100% cudatoolkit-10.1.243 | 513.2 MB | ####################################################################################### | 100% protobuf-3.12.3 | 711 KB | ####################################################################################### | 100% blinker-1.4 | 21 KB | ####################################################################################### | 100% requests-2.24.0 | 54 KB | ####################################################################################### | 100% werkzeug-1.0.1 | 243 KB | ####################################################################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done
Check to see if tensorflow can be loaded
(tf) login-1-108 ~ >: python Python 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>>
Simple tensorflow test program to make sure the virtual env can access a gpu. Program is called "tf.gpu.test.py"
import tensorflow as tf if tf.test.gpu_device_name(): print('Default GPU Device: {}'.format(tf.test.gpu_device_name())) else: print("Please install GPU version of TF")
Slurm script to submit the job
#!/bin/bash -l #SBATCH --job-name=tf_test #SBATCH --output=%x.%j.out # %x.%j expands to JobName.JobID #SBATCH --nodes=1 #SBATCH --tasks-per-node=1 #SBATCH --partition=datasci #SBATCH --gres=gpu:1 #SBATCH --mem=4G # Purge any module loaded by default module purge > /dev/null 2>&1 conda activate tf srun python tf.gpu.test.py
Result:
Starting /home/g/guest24/.bash_profile ... standard AFS bash profile Home directory : /home/g/guest24 is not in AFS -- skipping quota check On host node430 : 17:14:13 up 1 day, 1:17, 0 users, load average: 0.01, 0.07, 0.06 Your Kerberos ticket and AFS token status klist: No credentials cache found (filename: /tmp/krb5cc_22967_HvCVvuvMMX) Kerberos : AFS : Loading default modules ... Create file : "/home/g/guest24/.modules" to customize. No modules loaded 2020-07-29 17:14:19.047276: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2020-07-29 17:14:19.059941: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2200070000 Hz 2020-07-29 17:14:19.060093: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ea8ebfdb90 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-07-29 17:14:19.060136: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-07-29 17:14:19.061484: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 <ouput snipped> 2020-07-29 17:14:19.817386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-07-29 17:14:19.817392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 2020-07-29 17:14:19.817397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N 2020-07-29 17:14:19.819082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 15064 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:02:00.0, compute capability: 6.0) Default GPU Device: /device:GPU:0
GPU was recognized.
Install Jupyter Notebook
Download and install Miniconda as described earlier. Create a new environment and install Jupyter Notebook
login-1-105 ~ >: conda create --name jupyter python=3.7
Activate the new 'jupyter' environment
login-1-106 ~ >: conda activate jupyter (jupyter) login-1-107 ~ >:
Next, install Jupyter Notebook
(jupyter) login-1-107 ~ >: conda install jupyter notebook
Create the following script (jupyter.sh)
#!/bin/bash -l conda activate jupyter port=$(shuf -i 6000-9999 -n 1) cat<<EOF Jupyter server is running on: $(hostname) Job starts at: $(date) Step 1: Create SSH tunnel Open new terminal window, and run: (If you are off campus you will need VPN running) ssh -L $port:localhost:$port $USER@phi.njit.edu Step 2: Connect to Jupyter Keep the terminal in the previouse step open. Now open browser, find the line with Or copy and paste one of these URLs: the URL will be something like: http://localhost:${port}/?token=XXXXXXXX EOF jupyter notebook --no-browser --port $port --notebook-dir=$(pwd)
Next, create a script which will execute krenew (krenew.sh). krenew
is required to renew the tokens automatically to run the Jupyter Notebook in the background. For details see https://wiki.hpc.arcs.njit.edu/index.php/UsingKrenew
krenew -t -b -K 60 -- bash -c "$PWD/jupyter.sh >> $PWD/output.log 2>&1"
To make the file krenew.sh executable, use
chmod +x krenew.sh
Then execute the krenew.sh
./krenew.sh
This will generate an output file output.log
. Now open the log file and copy the URL. The URL will be in the following format
http://localhost:${port}/?token=XXXXXXXX
To kill the Jupyter Notebook process, you need to use the following command first to see the currently running processes.
login-1-106 ~ >: top -u guest
Replace guest
with NJIT UCID. Once you execute the command, you will see the output something like the following
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20653 guest 20 0 33132 1440 1072 S 0.0 0.0 0:00.04 krenew 20654 guest 20 0 113284 1216 1040 S 0.0 0.0 0:00.00 bash 20655 guest 20 0 113288 1624 1368 S 0.0 0.0 0:00.00 jupyter.sh 20693 guest 20 0 482688 89112 13024 S 0.0 0.0 1:23.88 jupyter-noteboo 21752 guest 20 0 862064 56588 9084 S 0.0 0.0 0:33.90 python 21772 guest 20 0 126384 2164 1684 S 0.0 0.0 0:00.00 bash 26251 guest 20 0 184632 2504 1116 S 0.0 0.0 0:00.00 sshd 26252 guest 20 0 126252 2100 1636 S 0.0 0.0 0:00.00 bash 26294 guest 20 0 172940 2524 1648 R 0.0 0.0 0:00.14 top
Identify the process ID (PID) responsible for running Jupyter Notebook. In this above output, the PID is 20693. To kill the process, use
login-1-106 ~ >: kill -9 20693