Dapla Toolbelt
Python module for use within Jupyterlab notebooks, specifically aimed for Statistics Norway's data platform called
Dapla
. It contains support for authenticated access to Google Services such as Google Cloud Storage (GCS) and custom
Dapla services such as Maskinporten Guardian. The
authentication process is based on the TokenExchangeAuthenticator
for Jupyterhub.
Features
These operations are supported:
- List contents of a bucket
- Open a file in GCS
- Copy a file from GCS into local
- Load a file (CSV, JSON or XML) from GCS into a pandas dataframe
- Save contents of a data frame into a file (CSV, JSON, XML) in GCS
When the user gives the path to a resource, they do not need to give the GCS uri, only the path.
This just means users don't have to prefix a path with "gs://".
It is implicitly understood that all resources accessed with this tool are located in GCS,
with the first level of the path being a GCS bucket name.
Requirements
- Python >3.8 (3.10 is preferred)
- Poetry, install via
curl -sSL https://install.python-poetry.org | python3 -
Installation
You can install Dapla Toolbelt via pip from PyPI:
pip install dapla-toolbelt
Usage
from dapla import FileClient
from dapla import GuardianClient
import pandas as pd
response = GuardianClient.call_api("https://data.udir.no/api/kag", "88ace991-7871-4ccc-aaec-8fb6d78ed04e", "udir:datatilssb")
data_json = response.json()
raw_data_df = pd.DataFrame(data_json)
raw_data_df.head()
FileClient.ls("bucket-name/folder")
path_base = "bucket-name/folder/raw_data"
FileClient.save_pandas_to_json(raw_data_df, f"{path_base}.json")
FileClient.save_pandas_to_csv(raw_data_df, f"{path_base}.csv")
FileClient.save_pandas_to_xml(raw_data_df, f"{path_base}.xml")
FileClient.cat(f"{path_base}.json")
df = FileClient.load_json_to_pandas(f"{path_base}.json")
df.head()
df = FileClient.load_csv_to_pandas(f"{path_base}.csv")
df.head()
df = FileClient.load_xml_to_pandas(f"{path_base}.xml")
df.head()
Contributing
Contributions are very welcome.
To learn more, see the Contributor Guide.
License
Distributed under the terms of the MIT license,
Dapla Toolbelt is free and open source software.
Issues
If you encounter any problems,
please file an issue along with a detailed description.
Credits
This project was generated from Statistics Norway's SSB PyPI Template.