The ESIP Data Readiness cluster has been working toward defining a community standard for AI-ready data. We surveyed AI and machine learning researchers about their data needs, and learned from similar efforts. Meanwhile, the NASA Earth Science Data System ML Training Data Interoperability Working Group has been working towards a community guideline on how to ensure the interoperability of training data for Earth science data. Both work will benefit from broad community contribution in this interactive session to further understand the current gap of AI-ready data.
This session will first present a first draft of the AI-ready data standard based on a recent community survey, defining the factors that are most important in determining AI-readiness for open datasets. Then we will feature three use cases representing different types of Earth science datasets. Each use case will showcase how data users prepare the datasets to ensure AI-readiness for model training and/or data reuse.
The session will conclude with a use-case driven discussion guided by the draft AI-ready data standard and training data interoperability guideline. The final goal of the session is to identify the next step for both groups to improve the community-driven standard and guideline that allows data providers/producers to improve the AI-readiness and training data interoperability via data stewardship and service/tool development. Such improvements are driven by the need of data users and will maximize the value of Earth science data for solving pressing societal challenges.
Agenda:
13:30–13:35: Welcome & Setting the stage [Douglas]13:36 - 13:50 Preliminary results of AI-ready data survey [Tyler Christensen]
13:51–14:15: Use cases
- LCMAP and Landsat ARD data - [Steve Labahn & Jesslyn Brown, USGS]
- Climate model data downscaling - [Seth McGinnis, NCAR]
- Forecast data postprocessing - [Stephen Haddad, UK Met Office]
14:16–14:20: Breakout room setup
14:21–14:55: Breakout room discussion
14:56–15:00: Next steps & wrap-up
View RecordingView Notes
Recommended ways to prepare for this session: Review materials for the ESIP Data Readiness Cluster here: https://wiki.esipfed.org/Data_Readiness.