National Science Foundation establishes a partnership to advance throughput computing

MADISON — Recognizing the University of Wisconsin-Madison’s leadership role in research computing, the National Science Foundation announced this month that the Madison campus will be home to a five-year, $22.5 million initiative to advance high-throughput computing.

The Partnership to Advance Throughput Computing (PATh) is driven by the growing need for throughput computing across the entire spectrum of research institutions and disciplines. The partnership will advance computing technologies and extend adoption of these technologies by researchers and educators.

The partnership will focus on distributed high throughput computing (dHTC) technologies and methodology. These tools leverage automation and build on distributed computing principles to enable researchers with large ensembles of computational tasks to effectively harness the computing capacity of thousands of computers assembled in a network. Such ensembles that might require decades to complete with conventional computing will provide results within days or hours, by distributing the tasks across this massive network.

Miron Livny, UW-Madison computer science professor and Morgridge Institute for Research chief technology officer, will lead the PATh project, which brings together the UW-Madison Center for High Throughput Computing (CHTC) and the national Open Science Grid consortium. Researchers and staff from the University of Southern California, Indiana University, the University of Chicago, the University of Nebraska and the University of California-San Diego will join forces in this project with the Madison team that will involve researchers and staff from the Computer Sciences department and the Morgridge Institute for Research.

The PATh award will fund more than 40 individuals across participating institutions, most of whom have been working together for years and in some cases for decades. “Research computing is vital to almost every corner of basic research, but the hardware is not as important as the people,” says Morgridge CEO Brad Schwartz. “The high-throughput computing center led by Miron Livny have helped make data science accessible and adaptable to thousands of scientific projects big and small.”

Livny pioneered high throughput computing more than 30 years ago, and today it is used to support hundreds of projects on the Madison campus through the CHTC. This early work contributed to the creation in 2005 of the Open Science Grid (OSG), which brings the power of high throughput computing to national and international research institutions and science projects. Two of these projects have received Nobel Prizes in physics: The discovery of the Higgs boson particle (2013) and the detection of gravitational waves (2017).

The HTCondor Software Suite developed and maintained by the CHTC will power the fabric of services for the national science and engineering community. This fabric of capacity services will be an integral part of the national ecosystem of coordinated cyberinfrastructure services promoted by the NSF. “We see PATh as a valuable component of this evolving ecosystem of services,” says the Director of the NSF Office for Advanced Cyberinfrastructure Manish Parashar.

“This partnership will enable us to take the HTCondor software over the next five years to a new level of scale and usability,” says Todd Tannenbaum who will continue to lead the CHTC software development effort as one of the four co-investigators of PATh. Together with co-PI and Morgridge Investigator Brian Bockelman, they will lead the innovation and implementation of new high-throughput technologies in the HTCondor software. Bockelman and Tannenbaum have been collaborating closely for over a decade, demonstrating the shared commitment of CHTC and the Morgridge Institute to advance and support research computing.

Bockelman says one of the great benefits of high throughput computing (HTC) is turning raw computing capacity into functional capacity that can be tailored to the needs of individual researchers. It differs from high performance computing — what people typically call supercomputing — in that it focuses on maximizing the science done by a computational resource instead of peak algorithmic performance, often through the use of distributed, heterogeneous computers

While it doesn’t have the flashiness of a single, large supercomputer, HTC’s impressiveness lies in its adaptability across disparate sources of computing capacity and across domains of science, he says. Bockelman adds, “With HTC we are constantly thinking, how can we maximize the use and scale? How can we make this more approachable to a wide range of scientists?”

One secret to the CHTC’s impact on the campus has little to do with technology, and everything to do with training and personalized consulting for researchers. Lauren Michael, a co-investigator of PATh, has been a research computing facilitator for CHTC since 2013, and her goal is to get scientists — most of whom have only used a single laptop or desktop — comfortable with adapting HTC to the computing needs of their research.

The CHTC team works with hundreds of scientists annually on an unlimited range of challenges. Current projects include developing methods to measure brain connectivity during surgery; predict fuel cell behavior at nuclear facilities; and imaging the internal structure of the Great Pyramid of Giza.

PATh will continue to advance these facilitation techniques and will bring them to additional campuses. “PATh exposes our methodologies to campuses across the nation. This includes campuses that have been under-resourced and under-represented in the national computing ecosystem and are therefore primed to benefit from advanced throughput computing capabilities,” she says.

Livny notes that the PATh project is funded as part of NSF’s computational ecosystem. PATh federates resources at the campus level, including more than 30 campus clusters funded in the last two years by NSF’s Campus Cyberinfrastructure program, which helps campuses build core capabilities in research computing. In turn, the campuses are required to make some portion of their computing capacity available for external users. Most campuses do so via the OSG, which aggregates computing capacity from sites across the world. PATh helps to support the core technologies and services offered by the OSG.

In addition to the technical contributions of Bockelman and his group, the Morgridge Institute will provide a professional and physical home for the project management team. “Effective project management is critical to the success of the PATh project given its scope, broad goals and different moving parts,” says Livny. “NSF gave us a mandate and the means to move in a new direction that includes community-building and workforce development so that more and more researchers and campuses will benefit from distributed high throughput computing.”