-

This site is deprecated and will be decommissioned shortly. For current information regarding HPC visit our new site: hpc.njit.edu

ReplacementOfKong

From NJIT-ARCS HPC Wiki
Jump to: navigation, search

Replacement of Kong.njit.edu

The Kong HPC public-access cluster is in the process of being replaced with new hardware.

The new public nodes will be added to the Lochness.njit.edu cluster currently in operation. Existing Kong users will be migrated to Lochness over a period of weeks, prioritized by their historical usage of Kong. The expansion is anticipated to be online 26 February 2021 (new ETA as of 2/20), and all users migrated by 15 March 2021.

The new nodes replace public access Kong CPU nodes and will be available for any NJIT researcher to use free of charge on a first-come first-served basis. Apart from the name and the far newer hardware, Lochness differs only in the scheduler (SLURM) used to submit jobs to the cluster. Academic and Research Computing Systems (ARCS) staff will notify Kong users in small groups that they are to be migrated and will assist them in rewriting their submit scripts for use with SLURM. See SGE to SLURM .

ARCS will also assist instructors, by request, with converting course materials to the Lochness / SLURM environment.

The new nodes are significantly more powerful than the 11+ year-old nodes they are replacing. The Lochness expansion is comprised of 48 nodes (37 public and 11 reserved), each with:

  • Two Intel Xeon 6226R 2.9GHz 16-core CPUs (total 1,536 cores)
  • 348GB RAM (total 16,704GB)
  • 960GB SSD for local scratch storage
  • HDR100 InfiniBand and 10GigE network interfaces

The nodes are all joined via low-latency high-speed Mellanox Infiniband (IB).

Following the complete migration of Kong users to Lochness, the reserved Lochness partitions (known as "queues" on Kong) will also be joined to the new IB network fabric. This will enable utilization of idle cycles on reserved nodes by other researchers.