-

This site is deprecated and will be decommissioned shortly. For current information regarding HPC visit our new site: hpc.njit.edu

Difference between revisions of "MinicondaUserMaintainedEnvs"

From NJIT-ARCS HPC Wiki
Jump to: navigation, search
 
(One intermediate revision by the same user not shown)
Line 203: Line 203:
  
 
GPU was recognized.
 
GPU was recognized.
 
==Install Jupyter Notebook==
 
Download and install Miniconda as described earlier.
 
Create a new environment and install Jupyter Notebook
 
 
<pre>
 
login-1-105 ~ >: conda create --name jupyter python=3.7
 
</pre>
 
 
Activate the new 'jupyter' environment
 
 
<pre>
 
login-1-106 ~ >: conda activate jupyter
 
(jupyter) login-1-107 ~ >:
 
</pre>
 
 
Next, install Jupyter Notebook
 
<pre>
 
(jupyter) login-1-107 ~ >: conda install jupyter notebook
 
</pre>
 
 
Create the following script (jupyter.sh)
 
<pre>
 
#!/bin/bash -l
 
 
conda activate jupyter
 
 
port=$(shuf -i 6000-9999 -n 1)
 
 
cat<<EOF
 
 
Jupyter server is running on: $(hostname)
 
Job starts at: $(date)
 
 
Step 1: Create SSH tunnel
 
 
Open new terminal window, and run:
 
(If you are off campus you will need VPN running)
 
 
ssh -L $port:localhost:$port $USER@phi.njit.edu
 
Step 2: Connect to Jupyter
 
 
Keep the terminal in the previouse step open. Now open browser, find the line with
 
 
Or copy and paste one of these URLs:
 
 
the URL will be something like:
 
 
http://localhost:${port}/?token=XXXXXXXX
 
 
EOF
 
 
jupyter notebook --no-browser --port $port --notebook-dir=$(pwd)
 
</pre>
 
 
Next, create a script which will execute krenew (krenew.sh). <code> krenew </code> is required to renew the tokens automatically to run the Jupyter Notebook in the background. For details see https://wiki.hpc.arcs.njit.edu/index.php/UsingKrenew
 
<pre>
 
krenew -t -b -K 60 -- bash -c "$PWD/jupyter.sh >> $PWD/output.log 2>&1"
 
</pre>
 
 
To make the file krenew.sh executable, use
 
<pre>
 
chmod +x krenew.sh
 
</pre>
 
 
Then execute the krenew.sh
 
<pre>
 
./krenew.sh
 
</pre>
 
 
This will generate an output file <code>output.log</code>. Now open the log file and copy the URL. The URL will be in the following format
 
<pre>http://localhost:${port}/?token=XXXXXXXX</pre>
 
 
To kill the Jupyter Notebook process, you need to use the following command first to see the currently running processes.
 
 
<pre>
 
login-1-106 ~ >: top -u guest
 
</pre>
 
Replace <code>guest</code> with NJIT UCID. Once you execute the command, you will see the output something like the following
 
<pre>
 
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM    TIME+ COMMAND
 
20653 guest    20  0  33132  1440  1072 S  0.0  0.0  0:00.04 krenew
 
20654 guest    20  0  113284  1216  1040 S  0.0  0.0  0:00.00 bash
 
20655 guest    20  0  113288  1624  1368 S  0.0  0.0  0:00.00 jupyter.sh
 
20693 guest    20  0  482688  89112  13024 S  0.0  0.0  1:23.88 jupyter-noteboo
 
21752 guest    20  0  862064  56588  9084 S  0.0  0.0  0:33.90 python
 
21772 guest    20  0  126384  2164  1684 S  0.0  0.0  0:00.00 bash
 
26251 guest    20  0  184632  2504  1116 S  0.0  0.0  0:00.00 sshd
 
26252 guest    20  0  126252  2100  1636 S  0.0  0.0  0:00.00 bash
 
26294 guest    20  0  172940  2524  1648 R  0.0  0.0  0:00.14 top
 
</pre>
 
 
Identify the process ID (PID) responsible for running Jupyter Notebook. In this above output, the PID is 20693. To kill the process, use
 
<pre>
 
login-1-106 ~ >: kill -9 20693
 
</pre>
 

Latest revision as of 19:05, 24 August 2023

Miniconda is an easy to install, minimal python distribution. Users can use miniconda to create virtual python environments to manage python modules. The instructions that follow are for linux.

Installation

Download miniconda

login-1-95 ~ >: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
--2020-07-29 16:24:59--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.131.3, 104.16.130.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.131.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 93052469 (89M) [application/x-sh]
Saving to: ‘Miniconda3-latest-Linux-x86_64.sh’

100%[========================================================================================>] 93,052,469  31.6MB/s   in 2.8s

2020-07-29 16:25:03 (31.6 MB/s) - ‘Miniconda3-latest-Linux-x86_64.sh’ saved [93052469/93052469]

Run the installation

login-1-96 ~ >: chmod +x Miniconda3-latest-Linux-x86_64.sh
login-1-96 ~ >: ./Miniconda3-latest-Linux-x86_64.sh

Accept the license and the default location. After python and some packages are installed you will be prompted to run conda init. Enter 'yes' at the prompt.

When the installation is complete the following appears:

> For changes to take effect, close and re-open your current shell. <

If you'd prefer that conda's base environment not be activated on startup,
   set the auto_activate_base parameter to false:

conda config --set auto_activate_base false

Thank you for installing Miniconda3!

Since you will likely be maintaining your own virtual environments, it is recommended not to activate the base environment on startup.

login-1-101 ~ >: conda config --set auto_activate_base false

Log off and log in again.

Create and Activate a Conda Virtual Environment

The following example will create a new conda environment based on python 3.7 and install tensorflow in the environment.

login-1-105 ~ >: conda create --name tf python=3.7
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/g/guest24/miniconda3/envs/tf

  added / updated specs:
    - python=3.7


The following packages will be downloaded:

 <output snipped>

Proceed ([y]/n)?y

 <output snipped>
#
# To activate this environment, use
#
#     $ conda activate tf
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Activate the new 'tf' environment

login-1-106 ~ >: conda activate tf
(tf) login-1-107 ~ >:

Install tensorflow-gpu

(tf) login-1-107 ~ >: conda install -c anaconda tensorflow-gpu
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/g/guest24/miniconda3/envs/tf

  added / updated specs:
    - tensorflow-gpu

<output snipped>

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates                                 pkgs/main --> anaconda
  certifi                                         pkgs/main --> anaconda
  openssl                                         pkgs/main --> anaconda


Proceed ([y]/n)?y

<output snipped>

mkl_fft-1.1.0        | 143 KB    | ####################################################################################### | 100%
urllib3-1.25.9       | 98 KB     | ####################################################################################### | 100%
cudatoolkit-10.1.243 | 513.2 MB  | ####################################################################################### | 100%
protobuf-3.12.3      | 711 KB    | ####################################################################################### | 100%
blinker-1.4          | 21 KB     | ####################################################################################### | 100%
requests-2.24.0      | 54 KB     | ####################################################################################### | 100%
werkzeug-1.0.1       | 243 KB    | ####################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Check to see if tensorflow can be loaded

(tf) login-1-108 ~ >: python
Python 3.7.7 (default, May  7 2020, 21:25:33)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>>

Simple tensorflow test program to make sure the virtual env can access a gpu. Program is called "tf.gpu.test.py"

import tensorflow as tf

if tf.test.gpu_device_name():

    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

else:

   print("Please install GPU version of TF")

Slurm script to submit the job

#!/bin/bash -l
#SBATCH --job-name=tf_test
#SBATCH --output=%x.%j.out # %x.%j expands to JobName.JobID
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --partition=datasci
#SBATCH --gres=gpu:1
#SBATCH --mem=4G

# Purge any module loaded by default
module purge > /dev/null 2>&1
conda activate tf
srun python tf.gpu.test.py

Result:

Starting /home/g/guest24/.bash_profile ... standard AFS bash profile

Home directory : /home/g/guest24 is not in AFS -- skipping quota check

On host node430 :
         17:14:13 up 1 day,  1:17,  0 users,  load average: 0.01, 0.07, 0.06

      Your Kerberos ticket and AFS token status 
klist: No credentials cache found (filename: /tmp/krb5cc_22967_HvCVvuvMMX)
Kerberos :
AFS      :

Loading default modules ...
Create file : "/home/g/guest24/.modules" to customize.

No modules loaded
2020-07-29 17:14:19.047276: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-07-29 17:14:19.059941: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2200070000 Hz
2020-07-29 17:14:19.060093: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ea8ebfdb90 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-29 17:14:19.060136: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-29 17:14:19.061484: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1

<ouput snipped>

2020-07-29 17:14:19.817386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-29 17:14:19.817392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-07-29 17:14:19.817397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-07-29 17:14:19.819082: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 15064 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:02:00.0, compute capability: 6.0)
Default GPU Device: /device:GPU:0

GPU was recognized.