.. image:: https://readthedocs.org/projects/s3pathlib/badge/?version=latest
:target: https://s3pathlib.readthedocs.io/en/latest/
:alt: Documentation Status
.. image:: https://github.com/aws-samples/s3pathlib-project/workflows/CI/badge.svg
:target: https://github.com/aws-samples/s3pathlib-project/actions?query=workflow:CI
.. image:: https://img.shields.io/badge/codecov-100%25-brightgreen
:target: https://github.com/aws-samples/s3pathlib-project/actions?query=workflow:CI
.. image:: https://img.shields.io/pypi/v/s3pathlib.svg
:target: https://pypi.python.org/pypi/s3pathlib
.. image:: https://img.shields.io/pypi/l/s3pathlib.svg
:target: https://pypi.python.org/pypi/s3pathlib
.. image:: https://img.shields.io/pypi/pyversions/s3pathlib.svg
:target: https://pypi.python.org/pypi/s3pathlib
.. image:: https://img.shields.io/pypi/dm/s3pathlib.svg
:target: https://pypi.python.org/pypi/s3pathlib
.. image:: https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social
:target: https://github.com/aws-samples/s3pathlib-project
.. image:: https://img.shields.io/badge/Link-Document-orange.svg
:target: https://s3pathlib.readthedocs.io/en/latest/
.. image:: https://img.shields.io/badge/Link-API-blue.svg
:target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html
.. image:: https://img.shields.io/badge/Link-Source_Code-blue.svg
:target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html
.. image:: https://img.shields.io/badge/Link-Submit_Issue-blue.svg
:target: https://github.com/aws-samples/s3pathlib-project/issues
.. image:: https://img.shields.io/badge/Link-Request_Feature-blue.svg
:target: https://github.com/aws-samples/s3pathlib-project/issues
.. image:: https://img.shields.io/badge/Link-Download-blue.svg
:target: https://pypi.org/pypi/s3pathlib#files
Welcome to s3pathlib
Documentation
s3pathlib <https://s3pathlib.readthedocs.io/en/latest/>
_ is a Python package that offers an object-oriented programming (OOP) interface to work with AWS S3 objects and directories. Its API is designed to be similar to the standard library pathlib <https://docs.python.org/3/library/pathlib.html>
_ and is user-friendly. The package also supports versioning <https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html>
_ in AWS S3.
.. note::
You may not be viewing the full document, `FULL DOCUMENT IS HERE <https://s3pathlib.readthedocs.io/en/latest/>`_
Quick Start
.. note::
`COMPREHENSIVE DOCUMENT guide / features / best practice can be found at HERE <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_
Import the library, declare an S3Path object
.. code-block:: python
# import
>>> from s3pathlib import S3Path
# construct from string, auto join parts
>>> p = S3Path("bucket", "folder", "file.txt")
# construct from S3 URI works too
>>> p = S3Path("s3://bucket/folder/file.txt")
# construct from S3 ARN works too
>>> p = S3Path("arn:aws:s3:::bucket/folder/file.txt")
>>> p.bucket
'bucket'
>>> p.key
'folder/file.txt'
>>> p.uri
's3://bucket/folder/file.txt'
>>> p.console_url # click to preview it in AWS console
'https://s3.console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt'
>>> p.arn
'arn:aws:s3:::bucket/folder/file.txt'
Talk to AWS S3 and get some information
.. code-block:: python
# s3pathlib maintains a "context" object that holds the AWS authentication information
# you just need to build your own boto session object and attach to it
>>> import boto3
>>> from s3pathlib import context
>>> context.attach_boto_session(
... boto3.session.Session(
... region_name="us-east-1",
... profile_name="my_aws_profile",
... )
... )
>>> p = S3Path("bucket", "folder", "file.txt")
>>> p.write_text("a lot of data ...")
>>> p.etag
'3e20b77868d1a39a587e280b99cec4a8'
>>> p.size
56789000
>>> p.size_for_human
'51.16 MB'
# folder works too, you just need to use a tailing "/" to identify that
>>> p = S3Path("bucket", "datalake/")
>>> p.count_objects()
7164 # number of files under this prefix
>>> p.calculate_total_size()
(7164, 236483701963) # 7164 objects, 220.24 GB
>>> p.calculate_total_size(for_human=True)
(7164, '220.24 GB') # 7164 objects, 220.24 GB
Manipulate Folder in S3
Native S3 Write API (those operation that change the state of S3) only operate on object level. And the list_objects <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2>
_ API returns 1000 objects at a time. You need additional effort to manipulate objects recursively. s3pathlib
CAN SAVE YOUR LIFE
.. code-block:: python
# create a S3 folder
>>> p = S3Path("bucket", "github", "repos", "my-repo/")
# upload all python file from /my-github-repo to s3://bucket/github/repos/my-repo/
>>> p.upload_dir("/my-repo", pattern="**/*.py", overwrite=False)
# copy entire s3 folder to another s3 folder
>>> p2 = S3Path("bucket", "github", "repos", "another-repo/")
>>> p1.copy_to(p2, overwrite=True)
# delete all objects in the folder, recursively, to clean up your test bucket
>>> p.delete()
>>> p2.delete()
S3 Path Filter
Ever think of filter S3 object by it's attributes like: dirname, basename, file extension, etag, size, modified time? It is supposed to be simple in Python:
.. code-block:: python
>>> s3bkt = S3Path("bucket") # assume you have a lots of files in this bucket
>>> iterproxy = s3bkt.iter_objects().filter(
... S3Path.size >= 10_000_000, S3Path.ext == ".csv" # add filter
... )
>>> iterproxy.one() # fetch one
S3Path('s3://bucket/larger-than-10MB-1.csv')
>>> iterproxy.many(3) # fetch three
[
S3Path('s3://bucket/larger-than-10MB-1.csv'),
S3Path('s3://bucket/larger-than-10MB-2.csv'),
S3Path('s3://bucket/larger-than-10MB-3.csv'),
]
>>> for p in iterproxy: # iter the rest
... print(p)
File Like Object for Simple IO
S3Path
is file-like object. It support open
and context manager syntax out of the box. Here are only some highlight examples:
.. code-block:: python
# Stream big file by line
>>> p = S3Path("bucket", "log.txt")
>>> with p.open("r") as f:
... for line in f:
... do what every you want
# JSON io
>>> import json
>>> p = S3Path("bucket", "config.json")
>>> with p.open("w") as f:
... json.dump({"password": "mypass"}, f)
# pandas IO
>>> import pandas as pd
>>> p = S3Path("bucket", "dataset.csv")
>>> df = pd.DataFrame(...)
>>> with p.open("w") as f:
... df.to_csv(f)
Now that you have a basic understanding of s3pathlib, let's read the full document <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>
_ to explore its capabilities in greater depth.
Getting Help
Please use the python-s3pathlib
tag on Stack Overflow to get help.
Submit a I want help
issue tickets on GitHub Issues <https://github.com/aws-samples/s3pathlib-project/issues/new/choose>
_
Contributing
Please see the Contribution Guidelines <https://github.com/aws-samples/s3pathlib-project/blob/main/CONTRIBUTING.rst>
_.
Copyright
s3pathlib is an open source project. See the LICENSE <https://github.com/aws-samples/s3pathlib-project/blob/main/LICENSE>
_ file for more information.