S3Contents - Jupyter Notebooks in S3
A transparent, drop-in replacement for Jupyter standard filesystem-backed storage system.
With this implementation of a
Jupyter Contents Manager
you can save all your notebooks, files and directory structure directly to a
S3/GCS bucket on AWS/GCP or a self hosted S3 API compatible like MinIO.
Installation
pip install s3contents
Install with GCS dependencies:
pip install s3contents[gcs]
s3contents vs X
While there are some implementations of an S3 Jupyter Content Manager such as
s3nb or s3drive
s3contents is the only one tested against new versions of Jupyter.
It also supports more authentication methods and Google Cloud Storage.
This aims to be a fully tested implementation and it's based on PGContents.
Configuration
Create a jupyter_notebook_config.py
file in one of the
Jupyter config directories
for example: ~/.jupyter/jupyter_notebook_config.py
.
Jupyter Notebook Classic: If you plan to use the Classic Jupyter Notebook
interface you need to change ServerApp
to NotebookApp
for all the examples on this page.
AWS S3
from s3contents import S3ContentsManager
c = get_config()
c.ServerApp.contents_manager_class = S3ContentsManager
c.S3ContentsManager.bucket = "<S3 bucket name>"
c.ServerApp.root_dir = ""
Authentication
Additionally you can configure multiple authentication methods:
Access and secret keys:
c.S3ContentsManager.access_key_id = "<AWS Access Key ID / IAM Access Key ID>"
c.S3ContentsManager.secret_access_key = "<AWS Secret Access Key / IAM Secret Access Key>"
Session token:
c.S3ContentsManager.session_token = "<AWS Session Token / IAM Session Token>"
AWS EC2 role auth setup
It also possible to use IAM Role-based access to the S3 bucket from an Amazon EC2 instance or AWS resource.
To do that just leave any authentication options (access_key_id
, secret_access_key
) to their default of None
and ensure that the EC2 instance has an IAM role which provides sufficient permissions (read and write) for the bucket.
Optional settings
c.S3ContentsManager.prefix = "this/is/a/prefix/on/the/s3/bucket"
c.S3ContentsManager.sse = "AES256"
c.S3ContentsManager.signature_version = "s3v4"
c.S3ContentsManager.init_s3_hook = init_function
AWS key refresh
The optional init_s3_hook
configuration can be used to enable AWS key rotation (described here and here) as follows:
from aiobotocore.credentials import AioRefreshableCredentials
from aiobotocore.session import get_session
from configparser import ConfigParser
from s3contents import S3ContentsManager
def refresh_external_credentials():
config = ConfigParser()
config.read('/home/jovyan/.aws/credentials')
return {
"access_key": config['default']['aws_access_key_id'],
"secret_key": config['default']['aws_secret_access_key'],
"token": config['default']['aws_session_token'],
"expiry_time": config['default']['aws_expiration']
}
async def async_refresh_credentials():
return refresh_external_credentials()
def make_key_refresh_boto3(this_s3contents_instance):
session_credentials = AioRefreshableCredentials.create_from_metadata(
metadata = refresh_external_credentials(),
refresh_using = async_refresh_credentials,
method = 'custom-refreshing-key-file-reader'
)
refresh_session = get_session()
refresh_session._credentials = session_credentials
this_s3contents_instance.boto3_session = refresh_session
c.ServerApp.contents_manager_class = S3ContentsManager
c.S3ContentsManager.init_s3_hook = make_key_refresh_boto3
MinIO playground example
You can test this using the play.minio.io:9000
playground:
Just be sure to create the bucket first.
from s3contents import S3ContentsManager
c = get_config()
c.ServerApp.contents_manager_class = S3ContentsManager
c.S3ContentsManager.access_key_id = "Q3AM3UQ867SPQQA43P2F"
c.S3ContentsManager.secret_access_key = "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG"
c.S3ContentsManager.endpoint_url = "https://play.minio.io:9000"
c.S3ContentsManager.bucket = "s3contents-demo"
c.S3ContentsManager.prefix = "notebooks/test"
Access local files
To access local file as well as remote files in S3 you can use hybridcontents.
Install it:
pip install hybridcontents
Use a configuration similar to this:
from s3contents import S3ContentsManager
from hybridcontents import HybridContentsManager
from notebook.services.contents.largefilemanager import LargeFileManager
c = get_config()
c.ServerApp.contents_manager_class = HybridContentsManager
c.HybridContentsManager.manager_classes = {
"": S3ContentsManager,
"local_directory": LargeFileManager,
}
c.HybridContentsManager.manager_kwargs = {
"": {
"access_key_id": "<AWS Access Key ID / IAM Access Key ID>",
"secret_access_key": "<AWS Secret Access Key / IAM Secret Access Key>",
"bucket": "<S3 bucket name>",
},
"local_directory": {
"root_dir": "/Users/danielfrg/Downloads",
},
}
GCP - Google Cloud Storage
Install the extra dependencies with:
pip install s3contents[gcs]
from s3contents.gcs import GCSContentsManager
c = get_config(
c.ServerApp.contents_manager_class = GCSContentsManager
c.GCSContentsManager.project = "<your-project>"
c.GCSContentsManager.token = "~/.config/gcloud/application_default_credentials.json"
c.GCSContentsManager.bucket = "<GCP bucket name>"
Note that the file ~/.config/gcloud/application_default_credentials.json
assumes
a POSIX system when you did gcloud init
.
Other configuration
File Save Hooks
If you want to use pre/post file save hooks here are some examples.
A pre_save_hook
is written in the exact same way as normal, operating on the
file in local storage before committing it to the object store.
def scrub_output_pre_save(model, **kwargs):
"""
Scrub output before saving notebooks
"""
if model["type"] != "notebook":
return
if model["content"]["nbformat"] != 4:
return
for cell in model["content"]["cells"]:
if cell["cell_type"] != "code":
continue
cell["outputs"] = []
cell["execution_count"] = None
c.S3ContentsManager.pre_save_hook = scrub_output_pre_save
A post_save_hook
instead operates on the file in object storage,
because of this it is useful to use the file methods on the contents_manager
for data manipulation.
In addition, one must use the following function signature (unique to s3contents
):
def make_html_post_save(model, s3_path, contents_manager, **kwargs):
"""
Convert notebooks to HTML after saving via nbconvert
"""
from nbconvert import HTMLExporter
if model["type"] != "notebook":
return
content, _format = contents_manager.fs.read(s3_path, format="text")
my_notebook = nbformat.reads(content, as_version=4)
html_exporter = HTMLExporter()
html_exporter.template_name = "classic"
(body, resources) = html_exporter.from_notebook_node(my_notebook)
base, ext = os.path.splitext(s3_path)
contents_manager.fs.write(path=(base + ".html"), content=body, format=_format)
c.S3ContentsManager.post_save_hook = make_html_post_save