-
This site is deprecated and will be decommissioned shortly. For current information regarding HPC visit our new site: hpc.njit.edu
Difference between pages "AfsandOptSoftware" and "HPCPlanningFY21-22"
(Importing text file) |
(Importing text file) |
||
Line 1: | Line 1: | ||
− | = | + | <div class="noautonum">__TOC__</div> |
− | + | == Kong Public-Access SuperMicro 6016T Nodes (public nodes) End of Life == | |
+ | <ul> | ||
+ | <li>The legacy operating system on Kong will be EOL'd by the vendor on November 30, | ||
+ | 2020.</li> | ||
− | + | <li>Successor OS versions are incompatible with the public nodes node interconnects.</li> | |
+ | |||
+ | <li>The 270 Kong public-access SuperMicro 6016T nodes | ||
+ | [https://web.njit.edu/topics/hpc/specs/ Kong-7] are more than 10 years old, | ||
+ | cannot be upgraded to be supported by current operating systems versions, and | ||
+ | do not support many of the critical applications needed by researchers.</li> | ||
− | + | <li>The public nodes cannot be integrated into OpenHPC, the new cluster framework already | |
+ | being deployed.</li> | ||
+ | </ul> | ||
− | + | === Schedules for Removing Kong Public Nodes from Service === | |
+ | The Kong public nodes are used for research and courses. | ||
+ | |||
+ | ==== Schedule for Researchers ==== | ||
+ | <ul> | ||
+ | <li>No new research accounts will be added to Kong after 31 July 2020</li> | ||
+ | |||
+ | <li>The public nodes will be removed from service on 04 January 2021</li> | ||
+ | </ul> | ||
+ | |||
+ | Notes: | ||
+ | <ul> | ||
+ | <li>The 2 public-access GPU nodes, which also contain CPUs [https://web.njit.edu/topics/hpc/specs/ Kong-6], | ||
+ | will be migrated to OHPC. These will be assigned to a researcher in Physics, who has demonstrated a special | ||
+ | need for them</li> | ||
+ | |||
+ | <li>All researcher-purchased and Data Science nodes will continue to be available</li> | ||
+ | </ul> | ||
+ | |||
+ | ==== Schedule for Courses ==== | ||
+ | <ul> | ||
+ | <li>Courses will not have access to the 270 public nodes after 04 January 2021</li> | ||
+ | </ul> | ||
+ | |||
+ | == Public-Access Node Provisioning == | ||
+ | === Phase 1: FY2021 === | ||
+ | <ul> | ||
+ | <li>Phase 1 provides a basic computational, node interconnect, and storage infrastructure | ||
+ | that will provide resoutrces adequate for many researchers to largely continue their | ||
+ | work at its present level, and which can be built upon as additional funding becomes | ||
+ | available</li> | ||
+ | |||
+ | <li>Phase 1 should be in place by 04 January 2021, and sooner if possible. | ||
+ | [https://wiki.hpc.arcs.njit.edu/external/MWYQ27065.pdf Phase 1 cost estimate]</li> | ||
+ | |||
+ | <li>It is assumed that Phase 1 equipment will be housed in the GITC 5302 data center, | ||
+ | pending the possible construction of an HPC data center elsewhere on campus.</li> | ||
+ | </ul> | ||
+ | |||
+ | === Phase 2: FY2022 === | ||
+ | While Phase 1 will provide resources for many researchers to continue at their current level, | ||
+ | it would not provide adequate resources for researchers to expand the scope of their work, or | ||
+ | to attract new researchers. | ||
+ | |||
+ | The purpose of Phase 2 is the expansion on the foundation implemented in Phase 1 to provide | ||
+ | resources commensurate with the goals of an R1 university. | ||
+ | |||
+ | The required funding for Phase 2 is expected to be at about the same level as that for Phase 1. | ||
+ | |||
+ | == New Researcher Purchases == | ||
+ | Researchers can continue to purchase equipment that will be dedicated to their use | ||
+ | (and shareable via SLURM), as part of the OpenHPC Lochness cluster. | ||
+ | |||
+ | However, there is no more rack capacity in the GITC 5302 data center to accommodate additional | ||
+ | equipment. The racks in which the public nodes are now housed cannot be used for modern nodes, | ||
+ | due to form factor mis-match. | ||
+ | |||
+ | Any racks needed for new purchases - and any needed additional electric power and HVAC - would have | ||
+ | to be somehow provided in order to accommodate such purchases. | ||
+ | |||
+ | == Cloud Provisioning == | ||
+ | [https://www.rescale.com/ Rescale] is cloud computing platform combining scientific software with | ||
+ | high performance computing. Rescale takes advantage of commercial cloud computing vendors such as AWS, | ||
+ | Azure and Google Cloud to provide compute cycles as well as storage. The Rescale services also include | ||
+ | applications setup, and billing. Thus, Rescale provides a pay-as-you-go method for researchers to use | ||
+ | commercial cloud services - e.g., Amazon Web Services, Azure, Google Cloud Platform. | ||
+ | |||
+ | The cost of supporting the Rescale infrastructure is a one-time fee of $12K for setup, and $99/month thereafter | ||
+ | for VPN access. Funding for the computation and storage costs incurred by the researcher at a | ||
+ | commercial cloud provider will be determined by the researcher and administration. |
Latest revision as of 16:33, 5 October 2020
Kong Public-Access SuperMicro 6016T Nodes (public nodes) End of Life
- The legacy operating system on Kong will be EOL'd by the vendor on November 30, 2020.
- Successor OS versions are incompatible with the public nodes node interconnects.
- The 270 Kong public-access SuperMicro 6016T nodes Kong-7 are more than 10 years old, cannot be upgraded to be supported by current operating systems versions, and do not support many of the critical applications needed by researchers.
- The public nodes cannot be integrated into OpenHPC, the new cluster framework already being deployed.
Schedules for Removing Kong Public Nodes from Service
The Kong public nodes are used for research and courses.
Schedule for Researchers
- No new research accounts will be added to Kong after 31 July 2020
- The public nodes will be removed from service on 04 January 2021
Notes:
- The 2 public-access GPU nodes, which also contain CPUs Kong-6, will be migrated to OHPC. These will be assigned to a researcher in Physics, who has demonstrated a special need for them
- All researcher-purchased and Data Science nodes will continue to be available
Schedule for Courses
- Courses will not have access to the 270 public nodes after 04 January 2021
Public-Access Node Provisioning
Phase 1: FY2021
- Phase 1 provides a basic computational, node interconnect, and storage infrastructure that will provide resoutrces adequate for many researchers to largely continue their work at its present level, and which can be built upon as additional funding becomes available
- Phase 1 should be in place by 04 January 2021, and sooner if possible. Phase 1 cost estimate
- It is assumed that Phase 1 equipment will be housed in the GITC 5302 data center, pending the possible construction of an HPC data center elsewhere on campus.
Phase 2: FY2022
While Phase 1 will provide resources for many researchers to continue at their current level, it would not provide adequate resources for researchers to expand the scope of their work, or to attract new researchers.
The purpose of Phase 2 is the expansion on the foundation implemented in Phase 1 to provide resources commensurate with the goals of an R1 university.
The required funding for Phase 2 is expected to be at about the same level as that for Phase 1.
New Researcher Purchases
Researchers can continue to purchase equipment that will be dedicated to their use (and shareable via SLURM), as part of the OpenHPC Lochness cluster.
However, there is no more rack capacity in the GITC 5302 data center to accommodate additional equipment. The racks in which the public nodes are now housed cannot be used for modern nodes, due to form factor mis-match.
Any racks needed for new purchases - and any needed additional electric power and HVAC - would have to be somehow provided in order to accommodate such purchases.
Cloud Provisioning
Rescale is cloud computing platform combining scientific software with high performance computing. Rescale takes advantage of commercial cloud computing vendors such as AWS, Azure and Google Cloud to provide compute cycles as well as storage. The Rescale services also include applications setup, and billing. Thus, Rescale provides a pay-as-you-go method for researchers to use commercial cloud services - e.g., Amazon Web Services, Azure, Google Cloud Platform.
The cost of supporting the Rescale infrastructure is a one-time fee of $12K for setup, and $99/month thereafter for VPN access. Funding for the computation and storage costs incurred by the researcher at a commercial cloud provider will be determined by the researcher and administration.