Difference between revisions of "CollabFileSystem"

From NJIT-ARCS HPC Wiki
Jump to: navigation, search
(Importing text file)
 
(No difference)

Latest revision as of 16:32, 5 October 2020


Proof of Concepts

  • SchedMD (SLURM) and Google Cloud (and now Azure)
    • Federation across Rutgers campuses and the Cloud
    • Will federate all of Office of Advanced Research Computing (OARC) managed compute resources across all campuses
    • (each campus will have its own set of policies)
    • Currently testing deployment of virtual cluster in Google Cloud within this federated environment. Azure is next.
    • Goal is to build an elastic environment where we can burst into the cloud when needed
  • Internet2 and CISCO
    • Information-Centric Networking (ICN) node installed
    • The vision of the project is to develop, promote, and evaluate a new approach to a communications architecture
    • A multipoint distribution network for various kinds of data
    • Enables applications to address and request data by name rather than location
    • The network can locate and retrieve the data dynamically from any potential source, leading to better mobility
    • Looking for researchers interested in testing this environment
  • Regional
    • Federated sharing of HPC resources across multiple institutions, including public cloud resources, by extending an intra-institution sharing mechanism developed by Rutgers and Google based on the SLURM scheduler
    • Rutgers, Internet2, NYSERNet, OSHEAN, KINBER, Massachusetts Green High-Performance Computing Center (MGHPCC), and NJEdge
    • Rutgers (already in), Google cloud (already in), MGHPCC (signed up), Syracuse, Penn State, URI or Brown, Pitt, NYU, Columbia, RPI, University of New Hampshire, NJIT, Princeton, Franklin and Marshall, and University of Maine
    • Becomes model for national research platform
    • NetApp
    • Testing distributed Object Storage between Newark and Hill
  • NetApp
    • Testing distributed Object Storage between Newark and Hill

Storage

  • OARC's HPC main storage facilities comprise four Spectrum Scale (formerly GPFS) appliances distributed across the university with an aggregate of 6PB useable capacity
  • Two Lustre appliances with an aggregate of 320TB are also in service
  • 1.33PB of general purpose ZFS storage on the Central campus for backup, archive and other general purposes
  • NAS appliances on other campuses total approximately 270TB and serve similar needs

Services currently running or planned to run

  • Spectrum Scale (formerly GPFS) AFM (advanced file management) which can provide high speed local caching of data stores from distributed GPFS appliances
  • FIONA nodes (very fast data transfer, shared application and visualization support)
  • Jupyter Hub (server) and Jupyter Notebook (extremely rich application dev sharing)
  • Xrootd High performance storage cluster with attributes of performance fault tolerance, scalability and interoperability. Requires, like any storage cluster, client software (XrootdFS).