The PATh Facility
The PATh Facility is a purpose-built, national-scale distributed High Throughput Computing (dHTC) resource meant to deliver computational capacity to NSF researchers.
The PATh Facility is part of a pilot project funded by the NSF Office of Advanced Cyberinfrastructure. NSF-funded researchers can apply to request credits for the PATh facility. The facility aims to demonstrate the value of dedicated, distributed resources to the NSF Science and Engineering community.
The facility aims to empower researchers through the use of high throughput computing techniques and, as it is completed during 2022, will be spread across six physical sites. Unique aspects of the facility include:
- Emphasis on workloads organized into ensembles of many tasks. The PATh facility makes its greatest impact when users have a large number (tens of thousands through millions) of individual batch jobs.
- Due to its national, distributed nature, multi-node MPI job capabilities are not offered (single-node MPI works well). The PATh facility provides a “scale out” resource, not “scale up”.
- Similarly, the job scheduler handles the movement of input data to the worker node and output back to the Access Point. This provides the facility with the ability to manage data movement and the flexibility to move jobs to a wider variety of sites. There is no shared filesystem between the access point and worker node.
Common examples of HTC workloads include parameter sweeps, Monte-Carlo simulations, or processing of thousands of data sets where each data set can be processed by an independent job.
Accessing the PATh Facility
Users of the facility will receive credit accounts directly from the NSF which they can utilize to run high throughput computing workflows. The NSF is making credits available through a number of mechanisms, including:
- Solicitations: The 2021 CSSI solicitation included a mechanism to request credits as part of the project proposal.
- Pilot for the Allocation of High-Throughput Computing Resources (HTC): The NSF provides a mechanism where existing PIs can apply for credits as a supplement to an existing award or as part of a new award. See the Dear Colleague Letter 22-051 for more information on the pilot.
The PATh team is here to help! As part of the consulting services offered to any researcher, our team can help you decompose your workload to ensembles of jobs and generate resource estimates for the various tasks in the ensemble. Please reach out to [email protected] to initiate your consult.
The PATh Facility is under construction; the first resources came online in spring 2022 and it will continue to expand through the year; it will eventually consist of about 30,000 cores and 3 dozen A100 GPUs. The facility includes the following sites and resources:
- Lincoln, Nebraska: University of Nebraska Lincoln’s Holland Computing Center will host 32 machines with 64 AMD EPYC cores (AMD 7513), 1.6TB of NVMe, and 256 GB RAM each. One machine will have 4 A100 GPUs, 1.6TB of NVMe, and 512 GB RAM.
- Syracuse, New York: Syracuse University’s Research Computing group will host 32 machines with 64 AMD EPYC cores (AMD 7513), 1.6TB of NVMe, and 256 GB RAM each. One machine will have 4 A100 GPUs, 1.6TB of NVMe, and 512 GB RAM.
- Miami, Florida: Florida International University’s AMPATH network will host PATh equipment in their Miami interchange point. This will consist of 4 machines with 64 AMD EPYC cores (AMD 7513), 1.6TB of NVMe, and 256 GB RAM each. One machine will have 4 A100 GPUs, 1.6TB of NVMe, and 512 GB RAM
- San Diego, California: An additional 2 racks will be added to the Expanse resource at San Diego Supercomputing Center (SDSC), usable via the PATh credit accounts. Each rack will hold 16 A100 devices not part of the original Expanse design. Availability is expected in early summer 2022.
- Madison, Wisconsin: University of Wisconsin-Madison’s Center for High Throughput Computing will serve as a staging ground for the resources destined for Lincoln, Syracuse, and Miami prior to shipment to their final destinations. Four machines will be kept at Madison, primarily for debugging and testing purposes.
- Austin, Texas: PATh has received a large allocation in the recently-upgraded Stampede2 resource at the Texas Advanced Computing Center (TACC); this allocation will be reachable via PATh computing credits. Stampede2’s new resources include 224 dual-socket Intel Xeon Platinum 8380 CPUs (40 cores each).
The hosts destined for Lincoln, Syracuse, and Miami are currently being tested at Madison; availability is expected in late spring 2022.