Product
Introducing License Enforcement in Socket
Ensure open-source compliance with Socketβs License Enforcement Beta. Set up your License Policy and secure your software!
An intake plugin for parsing an Earth System Model (ESM) catalog and loading netCDF files and/or Zarr stores into Xarray datasets.
CI | |
---|---|
Docs | |
Package | |
License | |
Citation |
Computer simulations of the Earthβs climate and weather generate huge amounts of data. These data are often persisted on HPC systems or in the cloud across multiple data assets of a variety of formats (netCDF, zarr, etc...). Finding, investigating, loading these data assets into compute-ready data containers costs time and effort. The data user needs to know what data sets are available, the attributes describing each data set, before loading a specific data set and analyzing it.
Finding, investigating, loading these assets into data array containers such as xarray can be a daunting task due to the large number of files a user may be interested in. Intake-esm aims to address these issues by providing necessary functionality for searching, discovering, data access/loading.
intake-esm
is a data cataloging utility built on top of intake, pandas, and xarray, and it's pretty awesome!
Opening an ESM catalog definition file: An Earth System Model (ESM) catalog file is a JSON file that conforms
to the ESM Collection Specification. When provided a link/path to an esm catalog file, intake-esm
establishes
a link to a database (CSV file) that contains data assets locations and associated metadata
(i.e., which experiment, model, the come from). The catalog JSON file can be stored on a local filesystem
or can be hosted on a remote server.
In [1]: import intake
In [2]: import intake_esm
In [3]: cat_url = intake_esm.tutorial.get_url("google_cmip6")
In [4]: cat = intake.open_esm_datastore(cat_url)
In [5]: cat
Out[5]: <GOOGLE-CMIP6 catalog with 4 dataset(s) from 261 asset(s>
Search and Discovery: intake-esm
provides functionality to execute queries against the catalog:
In [5]: cat_subset = cat.search(
...: experiment_id=["historical", "ssp585"],
...: table_id="Oyr",
...: variable_id="o2",
...: grid_label="gn",
...: )
In [6]: cat_subset
Out[6]: <GOOGLE-CMIP6 catalog with 4 dataset(s) from 261 asset(s)>
Access: when the user is satisfied with the results of their query, they can load data assets (netCDF and/or Zarr stores) into xarray datasets:
In [7]: dset_dict = cat_subset.to_dataset_dict()
--> The keys in the returned dictionary of datasets are constructed as follows:
'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 100.00% [2/2 00:18<00:00]
See documentation for more information.
Intake-esm can be installed from PyPI with pip:
python -m pip install intake-esm
It is also available from conda-forge
for conda installations:
conda install -c conda-forge intake-esm
FAQs
An intake plugin for parsing an Earth System Model (ESM) catalog and loading netCDF files and/or Zarr stores into Xarray datasets.
We found that intake-esm demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Ensure open-source compliance with Socketβs License Enforcement Beta. Set up your License Policy and secure your software!
Product
We're launching a new set of license analysis and compliance features for analyzing, managing, and complying with licenses across a range of supported languages and ecosystems.
Product
We're excited to introduce Socket Optimize, a powerful CLI command to secure open source dependencies with tested, optimized package overrides.