Dear CICI PI: A Letter from the OSDF
The NSF PATh Project would like to help you enhance the experiences of the authorized consumers of your data by bringing your dataset to a national-scale data fabric. The Open Science Data Federation (OSDF) that we operate facilitates remote access to your data via a unified name space while managing the impact of this access on the storage hosting your data through a network of caches run across the nation and globe. We look forward to having your consumers join the growing community of researchers that are benefiting from the more than 100 transfers per second delivered by the OSDF.
Bringing your data to the OSDF is easy. We can help you copy the data to the NSF-funded OSStore storage operated by PATh or deploy a Pelican origin that serves as a gateway to the storage that hosts your data. This “Object Store” can be local on your campus (a filesystem) or in the cloud (an AWS S3 bucket).
Once in the OSDF, your data can be seamlessly processed leveraging capacity provided by the Open Science Pool (OSPool) that we operate or through NSF-funded services like the National Research Platform (NRP). The OSPool provides US researchers with compute capacity and automation. A large fraction of the more than 220M jobs served by the OSPool in the past year consumed objects provided by the OSDF. The high throughput computing (over .5M jobs/day) capacity offered by the OSPool is open to any US researcher.
Connecting to the OSDF
We offer two options to connect to the OSDF
We host it
The OSDF team runs the “origin” service connecting the repository to the OSDF; you provide the S3 credentials or HTTP access to the immutable objects.
If your dataset is on a filesystem, the NRP project hosts a Pelican origin in your DMZ.
You host it
You install & run the Pelican origin at your institution wherever the dataset is mounted.
This provides you control over the exported data and configuration of the authorization.
If neither of the above works, a copy of your dataset can be temporarily hosted at storage contributed by CC* awards through the PATh operated “OSStore” program.
You manage the access control policy of your data. The Pelican software interfaces with existing systems over standard protocols (such as OAuth2) or you can leverage the capabilities of CILogon to enable federated SSO and group access.
We will help you combine your dataset with higher-level services, such as the National Data Platform (NDP), to facilitate data discovery.
CICI Presentation Series
September 25, 2025
Yushun Dong
at CICI Presentation Series
Data
AI
Cybersecurity
PI Yushun Dong discusses the MLaaS project: Secure Machine Learning as a Service for Collaborative Scientific Research. The project aims to establish a secure framework to protect the three stakeholders in collaborative scientific computing environments: the model owner, the data provider, and the model user.
September 25, 2025
Zhe Zhao
at CICI Presentation Series
Data
AI
Cybersecurity
PI Zhe Zhao discusses the project FoodGuard: Enabling a Safe and Directive Multi-modal Foundation Model Ecosystem for Food Science Research. The project aims to provide guardrails for safe usage in scientific collaborations, and to integrate with critical food science applications.
September 25, 2025
Qian Wang
at CICI Presentation Series
Data
Cybersecurity
PI Qian Wang discusses the QURE project: Usable and Attack-Resistant Security Framework for Quantum Emulators. The project aims to ensure emulation platforms are trustworthy, reliable, and secure, in order to facilitate scientific research.
September 25, 2025
Justin Cappos
at CICI Presentation Series
Cybersecurity
PI Justin Cappos discusses the GRISL project: Protecting and Hardening Scientific Use of Software Libraries with GRISL. The project aims to harden unsafe scientific libraries without changing scientists’ code, and to use lightweight userspace isolation to prevent crashes and data corruption, in order to improve the reliability of scientific workflows.
September 25, 2025
Michela Taufer
at CICI Presentation Series
Data
Workflow
PI Michela Taufer discusses the SAFARI project: Scientific Analytics, Forensics, and Reproducibility for Workflows in Cyberinfrastructures. The project aims to integrate forensic data analytics into the Pegasus Workflow Management System to enhance the trustworthiness, reusability, and reproducibility of scientific workflows.
September 25, 2025
September 11, 2025
Mu Zhang
at CICI Presentation Series
Cybersecurity
PI Mu Zhang discusses the HPCSafeChain project: Software Supply Chain Security in High-Performance Computing: Understanding, Evaluation and Transition. Scientific computing relies on open-source software vulnerable to software supply chain (SSC) attacks. This project aims to apply SSC security research to high-performance computing, an area that is currently under-explored.
September 11, 2025
Yanan Guo
at CICI Presentation Series
Cybersecurity
AI
PI Yanan Guo discusses the SecGPU4AI project: Securing GPU Computing for AI-Driven Scientific Workflows. The objective of the project is to bolster the security of scientific AI research against GPU memory safety vulnerabilities.
September 11, 2025
Wei Zhang
at CICI Presentation Series
Cybersecurity
Data
PI Wei Zhang discusses the SafeSci-TEE project: Advancing Security in TEE-Enabled Scientific Research Workflows: A Holistic Approach. Scientific data is increasingly the target of attackers. This project aims to develop comprehensive security for scientific workflows on untrusted HPC infrastructure, with the ultimate goal of safeguarding data, and promoting trust for multi-institutional collaborations.
September 11, 2025
Wajih Ul Hassan
at CICI Presentation Series
Cybersecurity
AI
Data
PI Wajih Ul Hassan discusses the MLDL project: Multi-Layer Data Provenance and Federated Learning for Securing Scientific AI Pipelines. The project aims to build an end-to-end provenance infrastructure that tracks the full dataset lifecycle. This would enable transparency and accountability, and allow for trustworthy, reproducible AI-driven science.
September 11, 2025
Eunsuk Kang
at CICI Presentation Series
Cybersecurity
Co-PI Eunsuk Kang discusses the CloudSec project: Collaborative Policy Alignment for Secure Scientific Computing Infrastructures. The project is positioned as a collaborative security policy analysis for research cyberinfrasturcture. It includes a novel interface designed to elicit policy requirements and explanations from project stakeholders.
September 11, 2025
September 4, 2025
Yuan Tian
at CICI Presentation Series
Data
PI Yuan Tian discusses the Dprov project: A Data Provenance Framework for Medical Machine Learning. In this talk Tian reviews the needs for data integrity, provence and authenticity. Integrity involves detecting public ML models trained on corrupt data, Provence involves establishing a Standardize efficient, reproducible dataset-model tracking. Authenticity involves removing patient or corrupt data from models effectively. Tian further discusses data set requirements.
September 4, 2025
Teodora Baluta
at CICI Presentation Series
AI
Data
PI Teodora Baluta discusses the SCRYPTS-AI project, whose focus is to design solutions for collaboration across disciplines and develop new insights and approaches for distributed, sensitive and private or noisy data sets using AI.
September 4, 2025
Ka Pui (Ricky) Mok
at CICI Presentation Series
AI
PI Ka Pui (Ricky) Mok discusses the CANIS project: Curated AI-ready Network telescope datasets for Internet Security. Mok shares that CANIS is a suite of modules to improve the USDS-NT infrastructure for acquisition, processing, & analytics of cybersecurity research workflows.
September 4, 2025
Francesco Restuccia
at CICI Presentation Series
Data
PI Francesco Restuccia discusses the REPAIRT project: Securing XApps in Open RANs with Reliable and Principled AI Red-Teaming. The approach involves algorithms for proactive and dynamic defense from adversarial AI.
September 4, 2025
Ying Wang
at CICI Presentation Series
Cybersecurity
Networking
PI Ying Wang discusses the WRAP project: Programmable Wireless Infrastructure with Formal Assurance for Cross-Campus Research. Wang talks about how formal assurance translates researcher goals into verifiable policies and how runtime anomaly detection ensures continuous, secure operation.
September 4, 2025
August 28, 2025
Kemal Akkaya
at CICI Presentation Series
AI
August 28, 2025
Phuong Cao
at CICI Presentation Series
AI
August 28, 2025
KC Claffy and Steven Wallace
at CICI Presentation Series
Cybersecurity
Networking
August 28, 2025
John Heideman
at CICI Presentation Series
Networking
Data
August 28, 2025
Grace Kouadjo
at CICI Presentation Series
Networking
August 28, 2025
August 21, 2025
Dan Massey
at CICI Presentation Series
August 21, 2025
Brian Bockelman
at CICI Presentation Series
Pelican
OSDF
Data
Brian Bockelman, PI of the Pelican Platform, describes how the OSDF can partner with CICI PIs (and other producers of research datasets) to share their data and build a community of users, using the OSDF as a platform that can control data access, track data access and distribute data to individuals and computing systems.
August 21, 2025
Ilkay Altintas
at CICI Presentation Series
Data
NDP
OSDF
Ilkay Altintas, PI of the National Data Platform, shares how the NDP serves as a central organizing point for different datasets by providing cataloguing and search functionality.
August 21, 2025
The NSF-funded OSDF is a service operated by the PATh project (#2030508) and powered by the Pelican platform (#2331480).