-

This site is deprecated and will be decommissioned shortly. For current information regarding HPC visit our new site: hpc.njit.edu

HPCPlanningFY21-22

From NJIT-ARCS HPC Wiki
Revision as of 16:33, 5 October 2020 by Hpcwiki1 dept.admin (Talk | contribs) (Importing text file)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Kong Public-Access SuperMicro 6016T Nodes (public nodes) End of Life

  • The legacy operating system on Kong will be EOL'd by the vendor on November 30, 2020.
  • Successor OS versions are incompatible with the public nodes node interconnects.
  • The 270 Kong public-access SuperMicro 6016T nodes Kong-7 are more than 10 years old, cannot be upgraded to be supported by current operating systems versions, and do not support many of the critical applications needed by researchers.
  • The public nodes cannot be integrated into OpenHPC, the new cluster framework already being deployed.

Schedules for Removing Kong Public Nodes from Service

The Kong public nodes are used for research and courses.

Schedule for Researchers

  • No new research accounts will be added to Kong after 31 July 2020
  • The public nodes will be removed from service on 04 January 2021

Notes:

  • The 2 public-access GPU nodes, which also contain CPUs Kong-6, will be migrated to OHPC. These will be assigned to a researcher in Physics, who has demonstrated a special need for them
  • All researcher-purchased and Data Science nodes will continue to be available

Schedule for Courses

  • Courses will not have access to the 270 public nodes after 04 January 2021

Public-Access Node Provisioning

Phase 1: FY2021

  • Phase 1 provides a basic computational, node interconnect, and storage infrastructure that will provide resoutrces adequate for many researchers to largely continue their work at its present level, and which can be built upon as additional funding becomes available
  • Phase 1 should be in place by 04 January 2021, and sooner if possible. Phase 1 cost estimate
  • It is assumed that Phase 1 equipment will be housed in the GITC 5302 data center, pending the possible construction of an HPC data center elsewhere on campus.

Phase 2: FY2022

While Phase 1 will provide resources for many researchers to continue at their current level, it would not provide adequate resources for researchers to expand the scope of their work, or to attract new researchers.

The purpose of Phase 2 is the expansion on the foundation implemented in Phase 1 to provide resources commensurate with the goals of an R1 university.

The required funding for Phase 2 is expected to be at about the same level as that for Phase 1.

New Researcher Purchases

Researchers can continue to purchase equipment that will be dedicated to their use (and shareable via SLURM), as part of the OpenHPC Lochness cluster.

However, there is no more rack capacity in the GITC 5302 data center to accommodate additional equipment. The racks in which the public nodes are now housed cannot be used for modern nodes, due to form factor mis-match.

Any racks needed for new purchases - and any needed additional electric power and HVAC - would have to be somehow provided in order to accommodate such purchases.

Cloud Provisioning

Rescale is cloud computing platform combining scientific software with high performance computing. Rescale takes advantage of commercial cloud computing vendors such as AWS, Azure and Google Cloud to provide compute cycles as well as storage. The Rescale services also include applications setup, and billing. Thus, Rescale provides a pay-as-you-go method for researchers to use commercial cloud services - e.g., Amazon Web Services, Azure, Google Cloud Platform.

The cost of supporting the Rescale infrastructure is a one-time fee of $12K for setup, and $99/month thereafter for VPN access. Funding for the computation and storage costs incurred by the researcher at a commercial cloud provider will be determined by the researcher and administration.