-

This site is deprecated and will be decommissioned shortly. For current information regarding HPC visit our new site: hpc.njit.edu

Cyberinfrastructure

From NJIT-ARCS HPC Wiki
Revision as of 16:32, 5 October 2020 by Hpcwiki1 dept.admin (Talk | contribs) (Importing text file)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

I. Cyberinfrastructure (CI) Goals

Developments in information technologies have a large impact on the activities at technological universities such as NJIT. In order for its faculty and students to be productive and competitive, it is incumbent on the university to provide the necessary framework.

  • Provide resources, including compute power, accommodation of big data, software infrastructure, and high-capacity internal and external networking; provide support in running applications efficiently
  • Provide the capability of supporting leading edge research
  • Provide support in establishing collaborative research efforts with other institutions
  • Provide researchers a more desirable option than self-provisioning HPC equipment
  • Maintain the HPC infrastructure at a level consistent with the needs of researchers and the maintenance of a competitive stance relative to NJIT's peers
  • Provide a base level of infrastructure so that junior faculty can establish a record of scholarship, leading to external funding
  • Provide a robust and integrated identity and access management framework
  • Provide infrastructure suitable for educational purposes

II. Infrastructure

Compute Capacity

Nature and description

NJIT actively supports the needs of researchers requiring access to HPC and big data resources in order to efficiently perform computations for their research. NJIT-IST provides HPC resources for researchers in the form of Linux clusters, containing both CPUs and GPUs, shared-memory computers, and HTCondor, all of which can be used for both serial and parallel computations. Researchers can purchase nodes in the clusters for their dedicated use. For big data applications, IST supports an Hadoop cluster. There are also several faculty-managed servers located in IST datacenters.

Current status

	HPC Clusters (2)

		CPU nodes		: 393
		CPU cores 		: 3,240
		CPU RAM, GB		: 26,640
		CPU max Gflops	: 28,102
	
		GPU 			: 10
		GPU cores		: 26,072
		GPU RAM, GB		: 50 
		GPU max Gflops  	: 37,780

	Hadoop Cluster (1)
		
		CPU nodes		: 2
		CPU cores		: 32
		CPU RAM, GB		: 256

	HTCondor

		Hosts			: 66
		Cores			: 528
		RAM, GB			: 1,056	

Three-year plan

Based on historical usage data, a 10% annual increase in CPU capability to meet research needs is a reasonable assumption. Based on this assumption, after three years the HPC cluster CPU capabilities would be :

	HPC Clusters (2)

		CPU nodes		: 522
		CPU cores 		: 4,309
		CPU RAM, GB		: 38,091
		CPU max Gflops	: 37,375

At this time, there is insufficient historical usage data to estimate the need for increased GPU or HTCondor capacity.

Storage Capacity

Nature and description

Many HPC applications require very large amounts disk storage. The need for storage is growing at an increasingly rapid rate, especially in the areas of genomics, bioinfomatics, and big data.

Disk storage is comprised of three modes  :

  • Network
    • NFS
    • AFS
  • Local scratch disk on compute nodes, used for transient intermediate calculations
  • Shared scratch network disk, used for transient intermediate calculations

Both NFS- and AFS-mounted space, which are used for applications, and input and output files, are housed in the enterprise storage system managed by IST.

Both NFS and AFS space is backed up by the enterprise backup system.

Current status

Capacities of the NFS, AFS, and scratch space, as of June 2015.

	NFS-mounted, GB			: 5,165
	AFS-mounted, GB			: 28,860
	Scratch, local, GB			: 328,672
	Scratch, shared, GB		: 7,750

Three-year plan

Based on historical usage data, and the explosive neeed for storage for some types of applications, a 50% annual increase in AFS and NFS storage to meet research needs is a reasonable assumption. A 15% annual increase in scratch space is probably adequate

With these assumptions, after three years the HPC and big data storage capabilities would be :

        NFS-mounted, GB         	: 19,400
        AFS-mounted, GB         	: 112,000
        Scratch, local, GB      	: 500,000
        Scratch, shared, GB     	: 11,800

IST is currently in discussions with its storage vendor on various proposals for adding general storage capacity. This is expected to result in substantially increased overall capacity in the next year, with a concommitant increase in storage available for HPC and big data.

Some HPC applications benefit greatly from being able to use a "parallel" file system for storage. This type of file system allows applications to perform operations on stored data in parallel, across multiple disks, rather than sequentially on one disk. Such a file system, with a capacity of about 1TB, is planned to become a part of the HPC and big data infrastructure in the next year.

Internet Capacity

Nature and description

NJIT maintains multiple Internet connections to meet an ever-increasing demand for access and data. NJIT is a full Internet 2 member with a dedicated research connection. NJIT is also also a full member of the NJEDge.Net - New Jersey's Research and Education Network - with a dedicated connection to other state institutions, as well as fast Internet resources.

Current status

Capacities as of June 2015 :

  • Internet 1 : 1,850 Mbps (megabits per second)
  • Internet 2 : 40 Mbps

Three-year plan

NJIT plans to replace its secondary leased Internet connection with a dark fiber connection with multiple "lambdas" to its co-location Facility. This will provide NJIT additional growth potential, flexibility, and increased redundancy. We also have acquired - as part of the recent Technology Refresh - and are in the process of installing, additional routers and next generation firewalls. These devices will allow NJIT to securely connect to the Internet with multiple 10 Gbps (gigabit per second) connections.

The anticipated Internet capacity in three years is :

  • Internet 1 : 3,600 Mbps (megabits per second)
  • Internet 2 : 100 Mbps

Intranet Capacity

Nature and description

The internal network (intranet) at NJIT was recently upgraded via the Technology Refresh program. Fixed workstations are connected with wired connectivity. Mobile devices are supported with a robust wireless network.

Current status

Most NJIT buildings provide end-user workstations with wired 1,000 Mbps (Gigabit) Ethernet. These buildings are connected to NJIT's internal core network with one or more 10 Gbps connections.

Three buildings have legacy 100 Mbps Ethernet to the desktop.

The wireless network is a robust IEEE 802.11an network supported by 1,500 wireless access points. Fifteen percent of access points support IEEE 802.11ac.

Three-year plan

The remaining three buildings will be upgraded to wired gigabit Ethernet to the desktop. We estimate an increase of 100 wireless access points to support an increase in density. Event spaces and higher density areas such as residential buildings will be upgraded to IEEE 802.11ac as needed.

Software

Nature and description

High performance computing and big data applications often require specialized applications, in addition to the software that is needed to provision clusters and schedule jobs on cluster compute nodes.

Applications are a mix of open source and commercial software, with the bulk of applications being open source. Usually, the open source applications are built from source, and installed in the AFS distributed file system, from where they are accessible from all Linux AFS clients, including all HPC computational resources.

Current status

A wide spectrum of engineering and scientific software is available for use on NJIT's computing resources, including :

  • compilers for many languages
  • computational chemistry analysis
  • computational fluid dynamics analysis
  • GPU libraries
  • fast Fourier libraries
  • linear algebra libraries
  • linear and non-linear finite element static and dynamic analysis
  • linear and non-linear algebraic equation solvers
  • molecular dynamics analysis
  • MPI libraries
  • optimization
  • ordinary and partial differential equation solvers
  • statistical languages and packages
  • quantum mechanics calculations

Three-year plan

NJIT will continue to broaden the range of applications available to researchers using HPC and big data resources; update applications to current releases, and update applications to be compatible with operating systems upgrades. Where needed and financially feasible, the number of available licenses for commercial software packages will be increased, and new packages purchased.

Improved HPC provisioning and scheduling software will be incorporated as it becomes available. Similarly, new developments in the deployment and management of the Apache Hadoop big data environment will be adopted as appropriate.

As new software technologies, such as the Apache Spark cluster computing framework, become available, they will be incorpoarted into the NJIT HPC and big data framework.

III. Implementation of IPv6

Nature and description

The implementation plan for IPv6 has been in development for several years. The recently completed network Technology Refresh has provided for IPv6-capable hardware support throughout the campus network from Internet and Core Routing to the wired and wireless distribution edge.

The primary driving force for a full implementation of IPv6 is the Internet's need for additional IP addresses due to vast increase of the number of devices connected to the Internet. IPv6 is the replacement addressing scheme for IPv4. IPv6 provides enterprises with a vast increase in the quantity of IP addresses available.

Current status

The NJIT network physical infrastructure is capable of implementing IPv6. IPv6 can be tested within local LANsr; however it will not route outside of the LAN. Currently, the IPv4 network at NJIT has just over 25% of its assigned Class "B" free. The use of an IP address conservation plan and NAT addressing has put NJIT in a good position to continue using our IPv4 address space efficiently for the next 3 years without the need for the additional addresses that IPv6 provides.

Three-year plan

NJIT anticipates that in the coming 3 to 4 years public-facing enterprise and research servers will require IPv6 addressing. IPv6 addressing is important due to NJIT's extensive use of hosts virtualization, and the planned provisioning of hosting services for researchers. The estimated IPv6 completion date, with outside consulting support, is yet to be determined.

The future steps include:

  1. Procure consulting support services
  2. Acquire IPv6 address assignment
  3. Activate IPv6 on core and Internet Routers
  4. Implement IPv6 in the enterprise data center
  5. Provide IPv6 configurations services (DHCP)

IV. Identity and Access Management

Nature and description

Identity and access management (IAM) is a discipline which encompasses all of the tasks required to create and manage user identities and credentials within or across system and enterprise boundaries, while ensuring that the right services are available to the right people at the right times.

At NJIT, IAM has become increasingly important as information services are made available to diverse university constituents across heterogeneous environments that are both on-premises and in the cloud.

Current status

Identity and access management is currently implemented by a suite of homegrown processes. Processes were retrofitted in 2009 as the university transitioned to the Banner ERP. NJIT has contracted with a commercial vendor and is in process of vendor solution implementation.

NJIT currently supports Shibboleth for SAML and CAS enterprise and federated single signon.

Three-year plan

Replace legacy IAM systems and services and provide a single identity management solution across the university.

Features to be implemented by solution:

  • Automated account provisioning and deprovisioning
  • Implement password policy participating environments
  • Support role-based account and resource(service/application) provisioning
  • Support new affiliations, ie. prospects, parents, guests, vendors
  • Support realtime identity events - move away from feed-based maintenance cycles toward more real time, event driven model
  • Provide self-service account and password management
  • Delegated administration of accounts and passwords

Improvements to enterprise and federated single signon:

  • Support InCommon Research and Scholarship category
  • Implement Multifactor authentication
  • Attain InCommon Bronze (NIST Level 1) Level of Assurance
  • Attain InCommon Silver (NIST Level 2) Level of Assurance

V. Integration with Statewide Infrastructure / Collaboration

  • Section 1
  • Section 2
  • ....

VI. Education and Training

Nature and description

HPC and big data are fields in which both the software and hardware landscape changes rapidly. Academic and Research Computing Systems staff strives to keep current in these technologies, and to provide means of educating users in these areas.

Current status

In order to aid researchers in using HPC and big data resources, and apprising users of current technologies and practices, these services are offered :

  • 2 to 3 workshops on use of HPC and big data resources
  • HPC and big data wiki
  • annual meeting between reearchers using HPC and big data resources and Academic and Research Computing Systems staff

Three-year plan

It is anticipated that the current methods will be sufficient for the next three years, with possibly an increase in the frequency and range of topics of workshops.

VII. Support Staff

Nature and description

NJIT provides support staff for researchers who wish to use HPC and big data resources. The staff provides the specialized knowledge and guidance that is often needed by researchers to effectively use these resources .

Current status

Two staff members of Academic and Research Computing Systems (ARCS) devote almost all of their time to HPC and big data support, and a third member devotes some time to this. Support includes :

  • Installation of compilers, applications, libraries, and utilities requested by users
  • Installing specialized hardware for individual researchers
  • Configuring and/or arranging the implementation of specialized networking
  • Customized scripts to aid users in their use of HPC resources
  • Assistance in selection and purchase of hardware
  • Assistance in debugging and optimizing code
  • Assistance in getting applications to run
  • Assistance in working with and managing big data
  • Providing access to MySQL, Oracle, GIT, and other services in the NJIT cloud
  • Assistance in using cloud HPC resources

Three-year plan

NJIT recognizes the necessity of providing staff support for researchers using HPC and big data resources. In addition to ARCS staff support, staff in the Technology Support Center will be trained to handle first level response on use of HPC and big data resources.