Loading…
This event has ended. Create your own event on Sched.
For over 20 years, ESIP meetings have brought together the most innovative thinkers and leaders around Earth science data, thus forming a community dedicated to making Earth science data more discoverable, accessible and useful to researchers, practitioners, policymakers, and the public. The theme of this year’s meeting is "Data for All People: From Generation to Use and Understanding."

REGISTER HERE
Back To Schedule
Wednesday, January 19 • 1:30pm - 4:00pm
Unlocking ARCO: Analysis-Ready Cloud-Optimized Data transformation in practice

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Experience is the best teacher. Data providers need a space to share their experiences generating ARCO (analysis-ready cloud-optimized) datasets and use the ESIP Winter session to collect best practices and examples.

The Cloud Computing Cluster is excited to organize this session which will produce “real outputs” in the form of best practices, guidance and use case driven workflow with examples which will be formalized in github repositories and shared through social media such as LinkedIn, twitter and slack.

The Cloud Computing Cluster is composed of members from data and service providers such as USGS, NASA, NOAA, cloud providers such as AWS and Microsoft, and academic institutions such as UCAR/NCAR and the University of Washington. The January meeting provides an opportunity for these organizations to share and coalesce on best practices for analysis-ready cloud-optimized (ARCO) data.

View Recording
View Notes

Agenda:

Also see more details in this Slide deck
  • 1:30-1:45: Gather, share agenda, share some work in progress “guidance” documents where we are consolidating what we know and what we don’t know (“Lessons from the field”)
  • 1:45-2:05: Lightning talks - Listen to the experiences from a few in our community.
    • Action for attendees: Listen and use the chat box to ask questions.
  • 2:05-2:15: Individual (silent) brainstorming
    • Action for attendees: Make a copy of slide 9 and answer questions 
  • 2:15-2:35: Small group brainstorming + voting
    • Action for attendees: Share answers to slide 9 questions
    • Action for attendees: Vote on questions in slide
  • 2:35-3:05: Fishbowl
    • Action for attendees: If your question received the most votes, ask it to the group
    • Action for attendees: Listen to questions and deliver answers or insights.
  • 3:05-3:15: Break
    • Action for attendees: Recover for tutorials!
  • 3:15-3:45: Tutorials on kerchunk and pangeo-forge
    • Action for attendees: Listen to tutorials
  • 3:45-4: Wrap up and next steps
    • Action for attendees: Sign up for email and slack if not already on those channels

Lightning Talks

Four (4) 5-minute lightning talks
  1. Dieu My thanh Nguyen on Zarr chunking strategies research 
  2. Anderson Banihirwe on producing Zarr data store for the complex climate model data (Community Earth System Model Large Ensemble (CESM LENS) Data Sets on AWS - Datasets - DASH Search - PRODUCTION)
  3. Lucas Sterzinger on Fake it until you make it — Reading GOES NetCDF4 data on AWS S3 as Zarr for rapid data access
  4. Landung (Don) Setiawan on building data portals for OOI and other NASA-funded projects requiring large-scale data conversion and utilization of both COG and Zarr.
Tutorials
pangeo-forge with Charles Stern

Pangeo Forge is an open source tool for data Extraction, Transformation, and Loading (ETL). The goal of Pangeo Forge is to make it easy to extract data from traditional data repositories and deposit in cloud object storage in analysis-ready, cloud-optimized (ARCO) format.

Kerchunk with Lucas
Cloud-friendly access to archival data. Kerchunk is a library that provides a unified way to represent a variety of chunked, compressed data formats (e.g. NetCDF, HDF5, GRIB), allowing efficient access to the data from traditional file systems or cloud object storage.

Organizers
avatar for Aimee Barciauskas

Aimee Barciauskas

Data engineer, Development Seed
avatar for Rob Casey

Rob Casey

Deputy Director of Cyberinfrastructure, IRIS Data Services
Rob currently serves as Deputy Director of Cyberinfrastructure at the Incorporated Research Institutions for Seismology (IRIS) Data Management Center (DMC) in Seattle, WA. His responsibilities include management of software development and data services activities as well as leading... Read More →
avatar for Sudhir Shrestha

Sudhir Shrestha

Technical Director Web and Dissemination Services, NOAA NWS Office of Water Prediction
avatar for Rich Signell

Rich Signell

Research Oceanographer, USGS
Open Source Science, Pangeo, Python, JupyterHub, Cloud Computing, AWS, HPC on the Cloud, Dask, Xarray

Wednesday January 19, 2022 1:30pm - 4:00pm EST
TBA
  Breakout, Breakout
  • Keywords Cloud Computing, Data Stewardship
  • Collaboration Area Tags Cloud Computing, Community Data
  • Target Audience The target audience is anyone who is interested in generating cloud-optimized data archives and/or has experience doing so. The cloud computing cluster is composed of data providers and technologists who have affiliations from federal agencies to academic institutions and share the mission of making Earth observation archives more accessible to a greater number of people. We believe creating more and better cloud-optimized data archives will make that possible.