-

This site is deprecated and will be decommissioned shortly. For current information regarding HPC visit our new site: hpc.njit.edu

Difference between pages "MediaWiki:Sidebar" and "ReplacementOfKong"

From NJIT-ARCS HPC Wiki
(Difference between pages)
Jump to: navigation, search
(Importing text file)
 
(Importing text file)
 
Line 1: Line 1:
 +
==Replacement of Kong.njit.edu==
  
* SEARCH
+
The Kong HPC public-access cluster is in the process of being replaced
 +
with new hardware.
  
* Phake I
+
The new public nodes will be added to the Lochness.njit.edu cluster
** HPC and BD | Welcome to HPC & BD
+
currently in operation. Existing Kong users will be migrated to Lochness
** UserAccess | Access
+
over a period of weeks, prioritized by their historical usage of Kong. The
** AWSInstanceDeployment | AWS Instance Deployment
+
expansion is anticipated to be online 15 February 2021,
* Clusters
+
and all users migrated by 15 March 2021.
** Lochness | Lochness
+
** http://ist.njit.edu/cyberinfrastructure/hpckong.php | Kong
+
** http://ist.njit.edu/cyberinfrastructure/hpcstheno | Stheno
+
* Compilers
+
** CompilersGeneral | General
+
** Intel | Intel
+
* Consultation
+
** ScheduleAppt | Schedule an appointment to get help
+
* FAQ
+
** FAQ | FAQ
+
* GPU
+
** RunningCUDASamplesOnKong | Running CUDA Samples on Kong
+
** MatlabGPUOnStheno | Matlab GPU on Stheno
+
* Hardware Costs
+
** DiskAndBackupCost | Disk and Backup
+
** VMCost | Virtual Machine
+
** https://wiki.hpc.arcs.njit.edu/external/HPC_catalog-20140501-0858-AM-WIP.pdf | Node Catalog (outdated)
+
* HPC URLs
+
** HPCURLS | Useful URLs
+
* HTCondor
+
** HTCondor | Using HTCondor
+
* IST/ARCS Services
+
** ISTARCSServices | Services overview
+
* Lessons
+
** LessonsAndTutorials | Lessons and Tutorials
+
* Matlab Distributed Computing Server
+
** GettingStartedWithSerialAndParallelMATLABOnKongAndStheno | Getting Started with Serial and Parallel MATLAB on Kong and Stheno
+
** MatlabGPUOnStheno | Matlab GPU on Stheno
+
* News
+
** ReplacementOfKong | Replacement of Kong
+
* Outages
+
** Outage17Feb2016 | 17 Feb 2016
+
** Outage16Oct2018 | 15-16 Oct 2018
+
* Policies
+
** GITC4320 | GITC 4320 datacenter
+
** GuestAccounts | Guest Accounts
+
** MaxNumberOfJobsSubmitted | Maximum Number of Jobs Submitted
+
** PurgingOfScratchSpace | Purging of Scratch Space
+
** UseOfHeadnodes | Use of Headnodes
+
*  Python
+
** MinicondaUserMaintainedEnvs | Miniconda for User Maintained Python Environments
+
** Python2.7Packages | Python 2.7 Installed Packages
+
** Python3.5Packages | Python 3.5 Installed Packages
+
** Python3.6Packages | Python 3.6 Installed Packages
+
** Python3.6.tfPackages | Python 3.6.tf Installed Packages
+
** Python3.7Packages | Python 3.7 Installed Packages
+
* Researcher Resources
+
** HPCAndBDResources | HPC and BD Resources
+
** ResearcherBaseResources | Base Resource Allocations
+
* Researcher On-and-Off-Premise Resources
+
** ComputationalResourcesOnPremise | On-premise computational resources
+
** ComputationalResourcesOffPremise | Off-premise computational resources
+
* Researcher Problem Domains
+
** ResearcherProblemDomains | Classifications, Spring 2015
+
** https://wiki.hpc.arcs.njit.edu/external/surveys/spring2018/GeneralComp.pdf | Classifications, Spring 2018
+
* Researcher Symposia
+
** ResearcherSymposiumJan2017 | 11 January 2017
+
* Roadmap
+
** Roadmap | 18-month Roadmap
+
* Running Jobs
+
** SonOfGridEngine | SGE
+
** KongQueues | Kong Queues
+
** KongQueuesTable | Kong Queues Table
+
** SthenoQueues | Stheno Queues
+
** SthenoQueuesTable | Stheno Queues Table
+
** MPIOnHPCClusters | Using MPI
+
** UsingKrenew | Using krenew
+
** UsingKsub | Using ksub
+
** SpecifyResources | Specify Resources
+
** JobLimits | Job Limits
+
* Sharing Data
+
** SharingData | Methods of Sharing Data
+
* SLURM
+
** SchedulerIntro | Scheduler Introduction
+
** SGEToSLURM | SGE to SLURM Migration Guide
+
** KongToLochnessOrStheno-QuickStart | Kong to Lochness or Stheno Migration Quick Start Guide (SGE to SLURM
+
** SLURMExampleScripts | SLURM Example Scripts
+
* Software
+
** AfsandOptSoftware | Software Installed in /afs and in /opt
+
** SoftwareModulesAvailable | Modules Available
+
** ModulesOnLochness | Modules on Lochness
+
** Abaqus | Abaqus
+
** Fluent | Fluent
+
** Libraries | Libraries
+
** Lammps | Lammps
+
** Languages | Languages
+
** Tecplot | Tecplot
+
** NewSoftware | New
+
* Specifications
+
** ISTResearcherSupport | Material for Inclusion in Funding Proposals
+
** https://web-debug.njit.edu/hpc.specs | HPC Specs
+
** HPCSpecsExtract | HPC Specs Extract
+
** StorageTable | Storage table
+
* Surveys
+
** https://wiki.hpc.arcs.njit.edu/external/surveys/campchamp/Jan2019.results.pdf | Campus Champions On/off-premise HPC
+
** https://wiki.hpc.arcs.njit.edu/external/surveys/HPCsurveySpring2018Results.pdf | HPC & BD Survey Results 12/2018
+
** https://wiki.hpc.arcs.njit.edu/external/surveys/HPCsurveySpring2015Results.pdf | HPC & BD Survey Results 09/2015
+
** https://wiki.hpc.arcs.njit.edu/external/surveys/HPCsurveySpring2013Results.pdf | HPC & BD Survey Results 03/2013
+
** https://wiki.hpc.arcs.njit.edu/external/surveys/HPCStorageandStaffSpring2014.pdf | Researchers' HPC, Storage and Staff Needs 2015-2020
+
** https://wiki.hpc.arcs.njit.edu/external/surveys/spring2018/OutsideResource.pdf | Off-premise Resources, Srping 2018
+
** https://wiki.hpc.arcs.njit.edu/external/surveys/spring2018/InterBandSciDMZ.pdf | Internet 2 and Science DMZ, Spring 2018
+
* Tartan Initiative
+
** http://web.njit.edu/topics/hpc/tartan | Tartan HPC Initiative
+
* User
+
** UserAccess | Access
+
** UserEnvironment | Environment
+
** RemoteGUI | Remote GUI
+
** UserTools | Tools
+
* User Contributions
+
*Wiki Usage
+
** GettingStarted | Getting Started
+
** WikiEditingHelp | Wiki Editing Help
+
** WikiTextFormatting | Wiki Text Formatting
+
** Adding Links to Wiki | Adding Links to Wiki
+
** CreatingNewWikiPage | Creating a New Wiki Page
+
** Uploading Documents and Images | Uploading Documents and Images
+
  
* TOOLBOX
+
The new nodes replace public access Kong CPU nodes and will be
 +
available for any NJIT researcher to use free of charge on a first-come
 +
first-served basis.  Apart from the name and the far newer hardware,
 +
Lochness differs only in the scheduler (SLURM) used to submit jobs to the
 +
cluster.  Academic and Research Computing Systems (ARCS) staff will notify
 +
Kong users in small groups that they are to be migrated and will assist
 +
them in rewriting their submit scripts for use with SLURM. See [[ SGEToSLURM | SGE to SLURM ]].
 +
 
 +
ARCS will also assist instructors, by request, with converting course materials to the
 +
Lochness / SLURM environment.
 +
 
 +
The new nodes are significantly more powerful than the 11+ year-old nodes
 +
they are replacing.  The Lochness expansion is comprised of 48 nodes (37 public and 11
 +
reserved), each with:
 +
 
 +
<ul>
 +
<li>Two Intel Xeon 6226R 2.9GHz 16-core CPUs (total 1,536 cores)
 +
 +
<li>348GB RAM (total 16,704GB)
 +
 
 +
<li>960GB SSD for local scratch storage
 +
 
 +
<li>HDR100 InfiniBand and 10GigE network interfaces
 +
</ul>
 +
 
 +
The nodes are all joined via low-latency high-speed Mellanox Infiniband (IB). 
 +
 
 +
Following the complete migration of Kong users to Lochness, the
 +
reserved LOchness partitions (known as "queues" on Kong) will also be
 +
joined to the new IB network fabric. This will enable utilization of idle cycles
 +
on reserved nodes by other researchers.

Revision as of 18:41, 15 January 2021

Replacement of Kong.njit.edu

The Kong HPC public-access cluster is in the process of being replaced with new hardware.

The new public nodes will be added to the Lochness.njit.edu cluster currently in operation. Existing Kong users will be migrated to Lochness over a period of weeks, prioritized by their historical usage of Kong. The expansion is anticipated to be online 15 February 2021, and all users migrated by 15 March 2021.

The new nodes replace public access Kong CPU nodes and will be available for any NJIT researcher to use free of charge on a first-come first-served basis. Apart from the name and the far newer hardware, Lochness differs only in the scheduler (SLURM) used to submit jobs to the cluster. Academic and Research Computing Systems (ARCS) staff will notify Kong users in small groups that they are to be migrated and will assist them in rewriting their submit scripts for use with SLURM. See SGE to SLURM .

ARCS will also assist instructors, by request, with converting course materials to the Lochness / SLURM environment.

The new nodes are significantly more powerful than the 11+ year-old nodes they are replacing. The Lochness expansion is comprised of 48 nodes (37 public and 11 reserved), each with:

  • Two Intel Xeon 6226R 2.9GHz 16-core CPUs (total 1,536 cores)
  • 348GB RAM (total 16,704GB)
  • 960GB SSD for local scratch storage
  • HDR100 InfiniBand and 10GigE network interfaces

The nodes are all joined via low-latency high-speed Mellanox Infiniband (IB).

Following the complete migration of Kong users to Lochness, the reserved LOchness partitions (known as "queues" on Kong) will also be joined to the new IB network fabric. This will enable utilization of idle cycles on reserved nodes by other researchers.