The PATh Facility
The PATh Facility is a purpose-built, national-scale distributed High Throughput Computing (dHTC) resource delivering computational capacity to NSF researchers.
The PATh Facility is part of a pilot project funded by the NSF Office of Advanced Cyberinfrastructure. NSF-funded researchers can apply to request credits for the PATh facility. The facility’s purpose is to demonstrate the value of dedicated, distributed resources to the NSF Science and Engineering community.
The facility empowers researchers through the use of high throughput computing techniques and is spread across six physical sites. Unique aspects of the facility include:
- Emphasis on workloads organized into ensembles of many tasks. The PATh facility makes its greatest impact when users have a large number (tens of thousands through millions) of individual batch jobs.
- Due to its national, distributed nature, multi-node MPI job capabilities are not offered (single-node MPI works well). The PATh facility provides a “scale out” resource, not “scale up”.
- Similarly, the job scheduler handles the movement of input data to the worker node and output back to the Access Point. This provides the facility with the ability to manage data movement and the flexibility to move jobs to a wider variety of sites. There is no shared filesystem between the access point and worker node.
Common examples of HTC workloads include parameter sweeps, Monte-Carlo simulations, or processing of thousands of data sets where each data set can be processed by an independent job.
Accessing the PATh Facility
Users of the facility will receive credit accounts directly from the NSF which they can utilize to run high throughput computing workflows. The NSF is making credits available through a number of mechanisms, including:
- Solicitations: The 2021 CSSI solicitation included a mechanism to request credits as part of the project proposal.
- Pilot for the Allocation of High-Throughput Computing Resources (HTC): The NSF provides a mechanism where existing PIs can apply for credits as a supplement to an existing award or as part of a new award. See the Dear Colleague Letter 22-051 for more information on the pilot.
The PATh team is here to help! As part of the consulting services offered to any researcher, our team can help you decompose your workload to ensembles of jobs and generate resource estimates for the various tasks in the ensemble. Please reach out to [email protected] to initiate your consult.
The PATh Facility construction is complete! The first resources came online in spring 2022 and all site construction was completed in fall 2022; it consist of about 30,000 cores and 3 dozen A100 GPUs. The facility includes the following sites and resources:
- Lincoln, Nebraska: University of Nebraska Lincoln’s Holland Computing Center hosts 32 machines with 64 AMD EPYC cores (AMD 7513), 1.6TB of NVMe, and 256 GB RAM each. One machine has 4 A100 GPUs, 1.6TB of NVMe, and 512 GB RAM.
- Syracuse, New York: Syracuse University’s Research Computing group hosts 32 machines with 64 AMD EPYC cores (AMD 7513), 1.6TB of NVMe, and 256 GB RAM each. One machine will have 4 A100 GPUs, 1.6TB of NVMe, and 512 GB RAM.
- Miami, Florida: Florida International University’s AMPATH network hosts PATh equipment in their Miami interchange point. This consists of 4 machines with 64 AMD EPYC cores (AMD 7513), 1.6TB of NVMe, and 256 GB RAM each. One machine has 4 A100 GPUs, 1.6TB of NVMe, and 512 GB RAM
- San Diego, California: An additional 2 racks were added to the Expanse resource at San Diego Supercomputing Center (SDSC), usable via the PATh credit accounts. Each rack holds 16 A100 devices not part of the original Expanse design.
- Madison, Wisconsin: University of Wisconsin-Madison’s Center for High Throughput Computing served as a staging ground for the resources destined for Lincoln, Syracuse, and Miami. Four machines are kept at Madison, primarily for debugging and testing purposes.
- Austin, Texas: PATh has received a large allocation in the recently-upgraded Stampede2 resource at the Texas Advanced Computing Center (TACC); this allocation is reachable via PATh computing credits. Stampede2’s new resources include 224 dual-socket Intel Xeon Platinum 8380 CPUs (40 cores each).