New HPC-targeted cloud virtual machines
Azure HC-series Virtual Machines are now generally available in the West US 2 and East US regions. HC-series virtual machines (VMs) are optimized for the most at-scale, computationally intensive HPC applications. For this class of workload, HC-series VMs are the most performant, scalable, and price-performant ever launched on Azure or elsewhere on the public cloud.
With the Intel® Xeon® Scalable processors, codenamed Skylake, the HC-series delivers up to 3.5 teraFLOPS (double precision) with AVX-512 instructions, 190 GB/s of memory bandwidth, rich support for Intel® Parallel Studio XE HPC software, and SR-IOV-based 100 Gb/s InfiniBand. For a single VM scale set, a customer can utilize up to 13,200 physical CPU cores and more than 100 TB of memory for a single distributed memory workload.
HC extends Azure’s commitment to delivering supercomputer-class scale and performance for tightly-coupled workloads to the public cloud, and doing so at price points every customer can afford. Today we can happily say that Azure has once again achieved a new milestone in cloud HPC scalability.
Cutting edge HPC technology
HC-series VMs feature Intel® Xeon® Platinum 8168 processors that offer the fastest AVX, AVX2, and AVX-512 clock frequencies from the Intel Xeon® Scalable first generation family. This enables customers to realize a greater performance uplift when utilizing AVX-optimized applications.
HC-series VMs expose 44 non-hyperthreaded CPU cores and 352 GB of RAM, with a baseclock of 2.7 GHz, an all-cores Turbo speed of 3.4 GHz, and a single-core Turbo speed of 3.7 GHz. HC VMs also feature a 700 GB local NVMe SSD, and support up to four Managed Disks including the new Azure P60/P70/P80 Premium Disks.
A flagship feature of HC-series VMs is 100 Gb/s InfiniBand from Mellanox. HC-series VMs expose the Mellanox ConnectX-5 dedicated back-end NIC via SR-IOV, meaning customers can use the same OFED driver stack that they’re accustomed to in a bare metal context. HC-series VMs deliver MPI latencies as low as 1.7 microseconds, with consistency, bandwidth, and message rates in line with bare-metal InfiniBand deployments. For context, this is 8x to 16x lower network latency than found elsewhere on the public cloud.
Molecular dynamics beyond 20,000 cores
The Azure HPC team benchmarked many widely used HPC applications to reflect the diverse needs of our customers. One common class of applications are those that simulate the physical and chemical properties of molecules, otherwise known as molecular dynamics. To see how far HC-series VMs could scale, we benchmarked it using CP2K. We chose CP2K for several reasons. For one, it’s widely-used both in academia and industry. In fact, CP2K is one of 13 applications used by PRACE as part of the Unified European Applications Benchmark Suite to drive acceptance testing of supercomputers deployed in Europe. For another, CP2K benefits from AVX-512 so it is a good demonstration of what is possible when the latest hardware and software capabilities come together. Anyone can install and run CP2K as we have tested by following the procedure in our documentation here.
Our results from this scaling exercise as follows:
Figure 1: H20-DFT-LS case
Nodes | Ranks/Node | Threads/Rank | Cases/day | Time to Solution |
8 | 8 | 5 | 101 | 852.715 |
16 | 4 | 11 | 210 | 410.224 |
32 | 8 | 5 | 390 | 221.202 |
64 | 8 | 5 | 714 | 121.192 |
108 | 4 | 11 | 1028 | 84.723 |
128 | 8 | 5 | 1289 | 67.876 |
192 | 12 | 3 | 1515 | 57.827 |
256 | 4 | 11 | 3756 | 23.789 |
288 | 2 | 22 | 3927 | 22.009 |
392 | 2 | 22 | 4114 | 21.818 |
For the H20-DFT-LS benchmark (Figure 1), which is a single-point energy calculation using linear-scaling DFT and 2048 water molecules, HC-series VMs successfully scaled to 392 VMs and 17,248 cores. Most impressively, at the largest level of scale and compared to our baseline of 8 VMs, HC VMs provided a 40.7x improvement in cases-per-day throughput as compared to only a 49x increase in VM resources. Here, 288 VMs offers the optimal balance in price-performance for large scaling.
Figure 2: LiHFX case
Nodes | Ranks/Node | Threads/Rank | Cases/day | Time to Solution |
24 | 6 | 7 | 55 | 1556.201 |
36 | 4 | 11 | 86 | 1002.111 |
44 | 11 | 4 | 219 | 394.847 |
64 | 8 | 5 | 294 | 293.091 |
108 | 4 | 11 | 482 | 179.469 |
112 | 7 | 6 | 482 | 179.344 |
128 | 8 | 5 | 530 | 163.095 |
176 | 11 | 4 | 685 | 126.899 |
256 | 4 | 11 | 960 | 90.14 |
324 | 4 | 11 | 1016 | 85.871 |
512 | 2 | 22 | 1440 | 60.176 |
For the LiHFX benchmark, which is a single-point energy calculation simulating a 216 atom Lithium Hydride crystal with 432 electrons, HC-series VMs successfully scaled to 512 VMs and 22,528 cores. Once again, the return on compute investment is significant. At top-end scale and compared to our baseline of 24 VMs, HC VMs provided a 26.2x improvement in cases-per-day throughput for a 21.3x increase in VM resources.
Delighting HPC customers on Azure
The unique capabilities and cost-performance of HC-series VMs are a big win for scientists and engineers who depend on high-performance computing to drive their research and productivity to new heights. Organizations spanning aerospace, automotive, defense, financial services, heavy equipment, manufacturing, oil and gas, public sector academic, and government research can now use HC-series VMs to increase HPC application performance and deliver faster time-to-insight.
Available now
Azure Virtual Machine HC-series are currently available in West US 2 and East US, with additional regions rolling out soon.
• Find out more about high performance computing (HPC) in Azure.
• Learn about Azure Virtual Machines.
Leave a Reply