Difference between revisions of "ExecSummCaseStudyOnAndOffPremiseHPCCosts"

From NJIT-ARCS HPC Wiki
Jump to: navigation, search
(Importing text file)
 
(No difference)

Latest revision as of 16:32, 5 October 2020


Case Study Of On- and Off-Premise HPC Costs

Purpose

Compare on- and off-premise costs of high performance computing (HPC) resources for a particular case.

Usage Case

Dr. Gennady Gor, Dept. of Chemical Engineering, purchased approximately $85K of HPC hardware in FYE 2017 - with a similar amount to be spent for the same purpose in FYE 2018 - which will become part of the Kong.njit.edu HPC cluster. Of the $85K, about $67K is for computational nodes (discrete physical entities comprising CPU, RAM, disk, I/O channels, etc.), and $18K for the high-speed node interconnect (InfiniBand) needed to efficiently run the applications being used. For the purposes of this comparison, only the cost of the computational nodes were considered, since the costs of off-premise InfiniBand-like resources are not currently generally obtainable.

Method of Cost Comparison

  • The cost of purchase of the physical hardware was compared to the cost of purchasing resources from the following leading off-premise providers:
    • Amazon Web Services (AWS)
    • Azure from Microsoft
    • Google Cloud Platform (GCP)
    • Penguin Computing
    • Rackspace
  • For each vendor, the processor (CPU chip) offered by the off-premise provider that most closely matched the processor actually purchased was chosen. In all cases this match was at least nearly exact.
  • The cost comparison metric used was: dollars/core/GB RAM/year. ("core" is the basic processing unit of a CPU in a node; "GB RAM" is gigabytes of RAM accessible to each core in a node. This metric will be referred to as the unit cost.)
  • Costs were compared for a three year period
  • Costs were compared, where possible, for the following time utilizations: 100%, 75%, 50%, 25%. For an off-premise vendor the percent utilization is that percentage of the contracted resources actually used in a given billing period.
  • Off-premise costs were determined by a combination of: vendor on-line cost calculators, phone conversations, email, and price quotes

Results of Cost Comparison

The following table shows the normalized cost and G. Gor equivalent computational resources cost for a 3 year period.

Vendor Normalized unit cost, 100% utilization, 3 years Cost for G. Gor equivalent computational resources, 3 years
Microway (on-premise) 1.00 $67K
Amazon Web Services 3.38 $226K
Azure 16.39 $1,098K
Google Cloud Platform 4.32 $289K
Penguin Computing 3.02 $202K
Rackspace 6.36 $426K

The following table shows the normalized unit cost and G. Gor equivalent computational resources cost for a 5 year period for 100%, 75%, 50%, and 25% utilization.

Only the AWS pricing structure allowed this comparison, via their monthly-based on-demand pricing structure.

The AWS pricing structure used for this (second) table is different from that used in the first table. The first table uses contract pricing (the minimum contract period is 3 years), whereas the second table uses on-demand pricing, which is much higher. On-demand pricing is the only AWS mechanism available for calculating utilizations other that 100%.

Vendor Normalized unit cost (G. Gor equiv resources cost), 100% utilization Normalized unit cost, (G. Gor equiv resources cost), 75% utilization Normalized unit cost, (G. Gor equiv resources cost), 50% utilization Normalized unit cost, (G. Gor equiv resources cost), 25% utilization
Microway (on-premise) 1.00 ($67K) 1.00 ($67K) 1.00 ($67K) 1.00 ($67K)
Amazon Web Services 14.58 ($976K) 10.99 ($736K) 7.40 ($496K) 3.75 ($251K)

Additional Considerations

Software Environment

The off-premise model for HPC clusters is that each researcher has their own, independent cluster.

Each such cluster instance would require a significant initial and on-going time commitment on the part of ARCS staff, in working with the researcher to bring the cluster to a production state running the desired application(s), and making necessary adjustments as problems and new scenarios arise.

By contrast, in on-premise HPC there is shared software environment, resulting in far less customization than is needed than in the administration of independent clusters, since the needed applications, libraries, utilities, etc., are most likely already available; and, if not, when made available they are accessible by all researchers.

Sharing of Hardware Resources

  • On-premise HPC allows sharing of under-utilized computational resources amongst researchers - i.e., researchers owning dedicated resources can easily allow other researchers to use those resources.
  • A shared storage environment enables easy sharing of data amongst researchers, as compared to the difficulty in data sharing amongst separate off-premise HPC instances.
  • On-premise HPC can also exploit excess capacity in the NJIT virtual infrastructure, which can share the software environment of the HPC clusters, as well as enterprise storage used by the HPC clusters. This option is not practically available via off-premise HPC.

High-speed Node Interconnects

The G. Gor purchase includes high-speed network connectivity to the compute nodes, a necessity for many HPC parallel computations, but in general not used in enterprise computing. The only off-premise vendor to offer high-speed networking is Penguin Computing. This means that for some researchers vendors such as AWS are not an option.

Hardware Lifetime

The cost comparisons herein are based on a 3 to 5 year span. Historically at NJIT, HPC nodes are kept in useful operation between 5 and 10 years, with no recurring costs after the initial purchase, other than operational costs. This should be taken into account when evaluating off-premise HPC, with its recurring usage costs.