-

This site is deprecated and will be decommissioned shortly. For current information regarding HPC visit our new site: hpc.njit.edu

Difference between pages "AfsandOptSoftware" and "HPCPlanningFY21-22"

From NJIT-ARCS HPC Wiki
(Difference between pages)
Jump to: navigation, search
(Importing text file)
 
(Importing text file)
 
Line 1: Line 1:
=Software in /afs=
+
<div class="noautonum">__TOC__</div>
  
Software in /afs is available on all AFS linux clients including afsconnect[N].njit.edu, afsaccess[N].njit.edu, osl[N].njit.edu, and various other machines. This software includes all of the commercial software such as Abaqus, Ansys, Mathematica, Matlab, etc.... In addition, compilers, scientific libraries, and various utilities are installed in AFS.  To set your environment for software in AFS use modules. See [[SoftwareModulesAvailable|Modules Available]] for more information.
+
== Kong Public-Access SuperMicro 6016T Nodes (public nodes) End of Life ==
 +
<ul>
 +
<li>The legacy operating system on Kong will be EOL'd by the vendor on November 30,
 +
2020.</li>
  
=Software in /opt=
+
        <li>Successor OS versions are incompatible with the public nodes node interconnects.</li>
 +
 +
<li>The 270 Kong public-access SuperMicro 6016T nodes
 +
[https://web.njit.edu/topics/hpc/specs/ Kong-7] are more than 10 years old,
 +
cannot be upgraded to be supported by current operating systems versions, and
 +
do not support many of the critical applications needed by researchers.</li>
  
Software in /opt is used only by lochness.njit.edu and stheno.njit.edu. This software consists of compilers and scientific libraries packaged and maintained by the OpenHPC consortium.  Lochness.njit.edu and stheno.njit.edu use a different distribution of modules called lmod to manage user environments. For more information on using modules on lochness.njit.edu and stheno.njit.edu see [[ModulesOnLochness | Modules on Lochness]].
+
<li>The public nodes cannot be integrated into OpenHPC, the new cluster framework already
 +
being deployed.</li>
 +
</ul>
  
Note: software in /afs is available on lochness.njit.edu and stheno.njit.edu, however, only a small portion of the module files have been brought over. If a modulefile for software in /afs is needed, send an email to [mailto:arcs@njit.edu arcs@njit.edu].
+
=== Schedules for Removing Kong Public Nodes from Service ===
 +
The Kong public nodes are used for research and courses.
 +
 
 +
==== Schedule for Researchers ====
 +
<ul>
 +
<li>No new research accounts will be added to Kong after 31 July 2020</li>
 +
 
 +
<li>The public nodes will be removed from service on 04 January 2021</li>
 +
</ul>
 +
 
 +
Notes:
 +
<ul>
 +
<li>The 2 public-access GPU nodes, which also contain CPUs [https://web.njit.edu/topics/hpc/specs/ Kong-6],
 +
will be migrated to OHPC. These will be assigned to a researcher in Physics, who has demonstrated a special
 +
need for them</li>
 +
 
 +
<li>All researcher-purchased and Data Science nodes will continue to be available</li>
 +
</ul>
 +
 
 +
==== Schedule for Courses ====
 +
<ul>
 +
<li>Courses will not have access to the 270 public nodes after 04 January 2021</li>
 +
</ul>
 +
 
 +
== Public-Access Node Provisioning ==
 +
=== Phase 1: FY2021 ===
 +
<ul>
 +
<li>Phase 1 provides a basic computational, node interconnect, and storage infrastructure
 +
that will provide resoutrces adequate for many researchers to largely continue their
 +
work at its present level, and which can be built upon as additional funding becomes
 +
available</li>
 +
 
 +
<li>Phase 1 should be in place by 04 January 2021, and sooner if possible.
 +
[https://wiki.hpc.arcs.njit.edu/external/MWYQ27065.pdf Phase 1 cost estimate]</li>
 +
 
 +
<li>It is assumed that Phase 1 equipment will be housed in the GITC 5302 data center,
 +
pending the possible construction of an HPC data center elsewhere on campus.</li>
 +
</ul>
 +
 
 +
=== Phase 2: FY2022 ===
 +
While Phase 1 will provide resources for many researchers to continue at their current level,
 +
it would not provide adequate resources for researchers to expand the scope of their work, or
 +
to attract new researchers.
 +
 
 +
The purpose of Phase 2 is the expansion on the foundation implemented in Phase 1 to provide
 +
resources commensurate with the goals of an R1 university.
 +
 
 +
The required funding for Phase 2 is expected to be at about the same level as that for Phase 1.
 +
 
 +
== New Researcher Purchases ==
 +
Researchers can continue to purchase equipment that will be dedicated to their use
 +
(and shareable via SLURM), as part of the OpenHPC Lochness cluster.
 +
 
 +
However, there is no more rack capacity in the GITC 5302 data center to accommodate additional
 +
equipment. The racks in which the public nodes are now housed cannot be used for modern nodes,
 +
due to form factor mis-match.
 +
 
 +
Any racks needed for new purchases - and any needed additional electric power and HVAC - would have
 +
to be somehow provided in order to accommodate such purchases.
 +
 
 +
== Cloud Provisioning ==
 +
[https://www.rescale.com/ Rescale] is cloud computing platform combining scientific software with
 +
high performance computing. Rescale takes advantage of commercial cloud computing vendors such as AWS,
 +
Azure and Google Cloud to provide compute cycles as well as storage. The Rescale services also include
 +
applications setup, and billing. Thus, Rescale provides a pay-as-you-go method for researchers to use
 +
commercial cloud services - e.g., Amazon Web Services, Azure, Google Cloud Platform.
 +
 
 +
The cost of supporting the Rescale infrastructure is a one-time fee of $12K for setup, and $99/month thereafter
 +
for VPN access.  Funding for the computation and storage costs incurred by the researcher at a
 +
commercial cloud provider will be determined by the researcher and administration.

Latest revision as of 16:33, 5 October 2020

Kong Public-Access SuperMicro 6016T Nodes (public nodes) End of Life

  • The legacy operating system on Kong will be EOL'd by the vendor on November 30, 2020.
  • Successor OS versions are incompatible with the public nodes node interconnects.
  • The 270 Kong public-access SuperMicro 6016T nodes Kong-7 are more than 10 years old, cannot be upgraded to be supported by current operating systems versions, and do not support many of the critical applications needed by researchers.
  • The public nodes cannot be integrated into OpenHPC, the new cluster framework already being deployed.

Schedules for Removing Kong Public Nodes from Service

The Kong public nodes are used for research and courses.

Schedule for Researchers

  • No new research accounts will be added to Kong after 31 July 2020
  • The public nodes will be removed from service on 04 January 2021

Notes:

  • The 2 public-access GPU nodes, which also contain CPUs Kong-6, will be migrated to OHPC. These will be assigned to a researcher in Physics, who has demonstrated a special need for them
  • All researcher-purchased and Data Science nodes will continue to be available

Schedule for Courses

  • Courses will not have access to the 270 public nodes after 04 January 2021

Public-Access Node Provisioning

Phase 1: FY2021

  • Phase 1 provides a basic computational, node interconnect, and storage infrastructure that will provide resoutrces adequate for many researchers to largely continue their work at its present level, and which can be built upon as additional funding becomes available
  • Phase 1 should be in place by 04 January 2021, and sooner if possible. Phase 1 cost estimate
  • It is assumed that Phase 1 equipment will be housed in the GITC 5302 data center, pending the possible construction of an HPC data center elsewhere on campus.

Phase 2: FY2022

While Phase 1 will provide resources for many researchers to continue at their current level, it would not provide adequate resources for researchers to expand the scope of their work, or to attract new researchers.

The purpose of Phase 2 is the expansion on the foundation implemented in Phase 1 to provide resources commensurate with the goals of an R1 university.

The required funding for Phase 2 is expected to be at about the same level as that for Phase 1.

New Researcher Purchases

Researchers can continue to purchase equipment that will be dedicated to their use (and shareable via SLURM), as part of the OpenHPC Lochness cluster.

However, there is no more rack capacity in the GITC 5302 data center to accommodate additional equipment. The racks in which the public nodes are now housed cannot be used for modern nodes, due to form factor mis-match.

Any racks needed for new purchases - and any needed additional electric power and HVAC - would have to be somehow provided in order to accommodate such purchases.

Cloud Provisioning

Rescale is cloud computing platform combining scientific software with high performance computing. Rescale takes advantage of commercial cloud computing vendors such as AWS, Azure and Google Cloud to provide compute cycles as well as storage. The Rescale services also include applications setup, and billing. Thus, Rescale provides a pay-as-you-go method for researchers to use commercial cloud services - e.g., Amazon Web Services, Azure, Google Cloud Platform.

The cost of supporting the Rescale infrastructure is a one-time fee of $12K for setup, and $99/month thereafter for VPN access. Funding for the computation and storage costs incurred by the researcher at a commercial cloud provider will be determined by the researcher and administration.