As supercomputers close in on a milestone, cloud services race to catch up

Companies can access levels of computational power once limited to supercomputers, thanks to high-performance computing in public clouds.

By Aaron Ricadela | November 2020


As supercomputers close in on a milestone, cloud services race to catch up

At the Department of Energy’s Oak Ridge National Laboratory in Tennessee, scientists from Massachusetts biotech firm BERG are using the lab’s Summit supercomputer, the world’s second fastest, to analyze drug compounds that could prevent COVID-19 infections. In a separate experiment, Oak Ridge researchers analyzed patients’ lung fluid cells with the machine to understand how the body’s system for lowering blood pressure might accelerate the disease’s progression.

Argonne National Laboratory near Chicago, also part of the Energy Department’s supercomputing network, this year trained an artificial intelligence (AI) model to quickly predict how molecules will bind with coronavirus proteins, aiding the search for drugs that can inhibit the process. The system can store an entire neural network on a Cerebras Systems chip that’s the size of a dinner plate, speeding up analysis by allowing for extremely short pathways among circuits on the chip. The AI-based predictions provide starting points for the chemists’ further research.

“Science today is driven by simulation, and simulation is run on high-performance computers. It allows scientists to do things they couldn’t do in any other way. Today, AI is having the same kind of impact,” says Jack Dongarra, a distinguished researcher at Oak Ridge who is one of the curators of the biannual TOP500 supercomputers list, which was updated November 16 to chronicle the world’s fastest machines.

More companies and researchers are accessing levels of computational power once limited to supercomputers, thanks to high-performance computing (HPC) offerings in public clouds. While the TOP500 systems offer a rarified level of capability, businesses can now run increasingly power-hungry AI workloads and simulations on HPC in the cloud.

Automakers, agricultural companies, energy firms, and others are turning to public cloud services from technology companies including Oracle to simulate car crashes, predict global weather and climate, and train the neural networks that machine learning systems rely on to make their predictions. Shortening design cycles and pinpointing extreme weather events is key at a time when carmakers are under pressure from the transition to electric vehicles, and the force of hurricanes and wildfires has intensified. The work requires extensive data shuttling among processors, a shift from the early days of running high-performance cloud applications that ran in parallel.

“Definitely, it's the tightly coupled workloads that I'm seeing move into the cloud. That's been the majority of the conversations we're having,” said Oracle Vice President of Product Management and Strategy Karan Batta during a panel discussion at the virtual SC20 supercomputing conference this week.

Performance barriers broken

According to data released this week by Hyperion Research, the global market for high-performance computing hardware, software, and services is forecasted to expand by 6.2 percent a year from last year through 2024, recovering from a 2020 coronavirus-driven slump. The biggest growth is coming from cloud-based high-performance computing, which grew nearly 59 percent last year to US$3.9 billion.

 

“Science today is driven by simulation, and simulation is run on high-performance computers.”

Jack Dongarra, Distinguished Researcher, Oak Ridge National Laboratory

Hyperion predicts sales of HPC cloud services will hit US$8.8 billion by 2024, by which time public clouds would account for nearly a quarter of the US$37.7 billion market for high-performance computing. “I see everything rising up and to the right,” says Dongarra, of Oak Ridge National Lab.

Meanwhile, the world’s fastest supercomputers at national research centers in the US, Europe, and China are pushing toward a performance milestone: an exaflop, or one quintillion floating-point decimal operations per second. Tennessee’s Oak Ridge lab is poised to be first in the US to break the barrier next year with a system called Frontier, with an expected performance of 1.5 exaflops. China is also vying to reach exascale first, with three contenders.

The new ranking of the world’s fastest machines showed Japan’s Fugaku holding its #1 place with 442 petaflops of performance (about three times as fast as Oak Ridge’s second-ranked Summit). New systems in Germany and Saudi Arabia also cracked the top 10.

Europe is pushing to build an exascale machine by 2023. In September, the European Commission proposed €8 billion (US$9.4 billon) in funding to build supercomputers through 2033—a substantial budget hike. Next year, the EU’s shared LUMI supercomputer, with a planned performance of more than 500 petaflops, is scheduled to come online at Finland’s IT Center for Science. It will apply AI and data analysis techniques to climate modeling, medicine, self-driving cars, and other fields.

The power of public clouds

AI-based predictions based on digital simulations provide starting points for chemists.

AI-based predictions based on digital simulations provide starting points for chemists.

Public clouds let engineers and scientists apply advanced chip architectures and other technologies to their problems without a major capital investment in hardware. The services also let researchers avoid getting stuck waiting behind colleagues in a queue for limited on-premises HPC resources. “That’s one of the big attractions of cloud computing—it’s like a grocery store to get what you need for your job,” says Steve Conway, a senior advisor at Hyperion Research.

Cloud providers are adding high-speed networking to support today’s AI-intensive workloads. The tightly coupled computing work of AI requires frequent data transfer, compared with the highly parallel jobs in financial services and life sciences that were first to move to online environments.

To be sure, the cloud isn’t going to eclipse the need for supercomputers at the national research agency level any time soon. Some organizations still want their data to stay onsite for security or other reasons.

Researchers running massive computations for a single project for short periods likely wouldn’t opt for cloud, says Steve Wallach, a longtime supercomputer designer who is currently an advisor to the Los Alamos National Laboratory and the Barcelona Supercomputing Center. “Some high-performance computing apps may consume a supercomputer for a month or so,” he says. “A lot of HPC was designed to support one mission at one organization. It wasn’t for 1,000 time-sharing users.”

Nevertheless, the growth in cloud HPC has expanded the market rather than taking work from on-premises systems, according to Hyperion’s Conway. “These two environments are more alike than they were even three or four years ago,” he says.

Dig Deeper

Photography: janiecbros/Getty Images; dowell/Getty Images

Aaron Ricadela

Aaron Ricadela

Aaron Ricadela is a senior director at Oracle. He was previously a business journalist at Bloomberg News, BusinessWeek, and InformationWeek. You can follow him on Twitter @ricadela1.