Dear CICI PI: A Letter from the OSDF
The NSF PATh Project would like to help you enhance the experiences of the authorized consumers of your data by bringing your dataset to a national-scale data fabric. The Open Science Data Federation (OSDF) that we operate facilitates remote access to your data via a unified name space while managing the impact of this access on the storage hosting your data through a network of caches run across the nation and globe. We look forward to having your consumers join the growing community of researchers that are benefiting from the more than 100 transfers per second delivered by the OSDF.
Bringing your data to the OSDF is easy. We can help you copy the data to the NSF-funded OSStore storage operated by PATh or deploy a Pelican origin that serves as a gateway to the storage that hosts your data. This “Object Store” can be local on your campus (a filesystem) or in the cloud (an AWS S3 bucket).
Once in the OSDF, your data can be seamlessly processed leveraging capacity provided by the Open Science Pool (OSPool) that we operate or through NSF-funded services like the National Research Platform (NRP). The OSPool provides US researchers with compute capacity and automation. A large fraction of the more than 220M jobs served by the OSPool in the past year consumed objects provided by the OSDF. The high throughput computing (over .5M jobs/day) capacity offered by the OSPool is open to any US researcher.
Connecting to the OSDF
We offer two options to connect to the OSDF
We host it
The OSDF team runs the “origin” service connecting the repository to the OSDF; you provide the S3 credentials or HTTP access to the immutable objects.
If your dataset is on a filesystem, the NRP project hosts a Pelican origin in your DMZ.
You host it
You install & run the Pelican origin at your institution wherever the dataset is mounted.
This provides you control over the exported data and configuration of the authorization.
If neither of the above works, a copy of your dataset can be temporarily hosted at storage contributed by CC* awards through the PATh operated “OSStore” program.
You manage the access control policy of your data. The Pelican software interfaces with existing systems over standard protocols (such as OAuth2) or you can leverage the capabilities of CILogon to enable federated SSO and group access.
We will help you combine your dataset with higher-level services, such as the National Data Platform (NDP), to facilitate data discovery.
CICI Presentation Series
September 4, 2025
Yuan Tian
at CICI Presentation Series
Data
PI Yuan Tian discusses the Dprov project: A Data Provenance Framework for Medical Machine Learning. In this talk Tian reviews the needs for data integrity, provence and authenticity. Integrity involves detecting public ML models trained on corrupt data, Provence involves establishing a Standardize efficient, reproducible dataset-model tracking. Authenticity involves removing patient or corrupt data from models effectively. Tian further discusses data set requirements.
September 4, 2025
Teodora Baluta
at CICI Presentation Series
AI
Data
PI Teodora Baluta discusses the SCRYPTS-AI project, whose focus is to design solutions for collaboration across disciplines and develop new insights and approaches for distributed, sensitive and private or noisy data sets using AI.
September 4, 2025
Ka Pui (Ricky) Mok
at CICI Presentation Series
AI
PI Ka Pui (Ricky) Mok discusses the CANIS project: Curated AI-ready Network telescope datasets for Internet Security. Mok shares that CANIS is a suite of modules to improve the USDS-NT infrastructure for acquisition, processing, & analytics of cybersecurity research workflows.
September 4, 2025
Francesco Restuccia
at CICI Presentation Series
Data
PI Francesco Restuccia discusses the REPAIRT project: Securing XApps in Open RANs with Reliable and Principled AI Red-Teaming. The approach involves algorithms for proactive and dynamic defense from adversarial AI.
September 4, 2025
Ying Wang
at CICI Presentation Series
Cybersecurity
Networking
PI Ying Wang discusses the WRAP project: Programmable Wireless Infrastructure with Formal Assurance for Cross-Campus Research. Wang talks about how formal assurance translates researcher goals into verifiable policies and how runtime anomaly detection ensures continuous, secure operation.
September 4, 2025
August 28, 2025
Kemal Akkaya
at CICI Presentation Series
AI
August 28, 2025
Phuong Cao
at CICI Presentation Series
AI
August 28, 2025
KC Claffy and Steven Wallace
at CICI Presentation Series
Cybersecurity
Networking
August 28, 2025
John Heideman
at CICI Presentation Series
Networking
Data
August 28, 2025
Grace Kouadjo
at CICI Presentation Series
Networking
August 28, 2025
August 21, 2025
Dan Massey
at CICI Presentation Series
August 21, 2025
Brian Bockelman
at CICI Presentation Series
Pelican
OSDF
Data
Brian Bockelman, PI of the Pelican Platform, describes how the OSDF can partner with CICI PIs (and other producers of research datasets) to share their data and build a community of users, using the OSDF as a platform that can control data access, track data access and distribute data to individuals and computing systems.
August 21, 2025
Ilkay Altintas
at CICI Presentation Series
Data
NDP
OSDF
Ilkay Altintas, PI of the National Data Platform, shares how the NDP serves as a central organizing point for different datasets by providing cataloguing and search functionality.
August 21, 2025
The NSF-funded OSDF is a service operated by the PATh project (#2030508) and powered by the Pelican platform (#2331480).