target-s3-jsonl
Singer target that uploads loads data to S3 in JSONL format
following the Singer spec.
target-s3-jsonl
is a Singer Target which intend to work with regular Singer Tap. It take the output of the tap and export it as a JSON Lines files into an AWS S3 bucket.
This package is built using the target-core
library.
Install
First, make sure Python 3 is installed on your system or follow these
installation instructions for Mac or
Ubuntu.
Defaults
Note: To avoid version conflicts run tap
and targets
in separate virtual environments.
python -m venv ~/.virtualenvs/target-s3-jsonl
~/.virtualenvs/target-s3-jsonl/bin/pip install target-s3-jsonl
Head
python -m venv ~/.virtualenvs/target-s3-jsonl
~/.virtualenvs/target-s3-jsonl/bin/pip install --upgrade git+https://github.com/ome9ax/target-s3-jsonl.git@main
Alternative
python -m venv ~/.virtualenvs/target-s3-jsonl
source ~/.virtualenvs/target-s3-jsonl/bin/activate
pip install target-s3-jsonl
deactivate
Usage
Like any other target that's following the singer specificiation:
some-singer-tap | target-s3-jsonl --config [config.json]
It's reading incoming messages from STDIN and using the properites in config.json
to upload data into AWS S3.
Configuration settings
Running the the target connector requires a config.json
file. An example with the minimal settings:
{
"s3_bucket": "my_bucket"
}
Profile based authentication
Profile based authentication used by default using the default
profile. To use another profile set aws_profile
parameter in config.json
or set the AWS_PROFILE
environment variable.
Non-Profile based authentication
For non-profile based authentication set aws_access_key_id
, aws_secret_access_key
and optionally the aws_session_token
parameter in the config.json
. Alternatively you can define them out of config.json
by setting AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
and AWS_SESSION_TOKEN
environment variables.
Full list of options in config.json
:
Inherited from target-core
Property | Type | Mandatory? | Description |
---|
path_template | String | | (Default: None) Custom naming convention of the s3 key. Replaces tokens date , stream , and timestamp with the appropriate values.
Supports datetime and other python advanced string formatting e.g. {stream:_>8}_{timestamp:%Y%m%d_%H%M%S}.json or {stream}/{timestamp:%Y}/{timestamp:%m}/{timestamp:%d}/{timestamp:%Y%m%d_%H%M%S_%f}.json .
Supports "folders" in s3 keys e.g. folder/folder2/{stream}/export_date={date}/{timestamp}.json .
Honors the s3_key_prefix , if set, by prepending the "filename". E.g. path_template = folder1/my_file.json and s3_key_prefix = prefix_ results in folder1/prefix_my_file.json |
memory_buffer | Integer | | Memory buffer's size used for non partitioned files before storing the data into the temporary file. 64Mb used by default if unspecified. |
file_size | Integer | | File partitinoning by size_limit . File parts will be created. The path_template must contain a part section for the part number. Example "path_template": "{stream}_{date_time:%Y%m%d_%H%M%S}_part_{part:0>3}.json" . |
compression | String | | The type of compression to apply before uploading. Supported options are none (default), gzip , and lzma . For gzipped files, the file extension will automatically be changed to .json.gz for all files. For lzma compression, the file extension will automatically be changed to .json.xz for all files. |
timezone_offset | Integer | | Offset value in hour. Use offset 0 hours is you want the path_template to use utc time zone. The null values is used by default. |
work_dir | String | | (Default: platform-dependent) Directory of temporary JSONL files with RECORD messages. |
Specific For target-s3-jsonl
Property | Type | Mandatory? | Description |
---|
s3_bucket | String | Yes | S3 Bucket name |
aws_profile | String | | AWS profile name for profile based authentication. If not provided, AWS_PROFILE environment variable will be used. |
aws_endpoint_url | String | | AWS endpoint URL. |
aws_access_key_id | String | | S3 Access Key Id. If not provided, AWS_ACCESS_KEY_ID environment variable will be used. |
aws_secret_access_key | String | | S3 Secret Access Key. If not provided, AWS_SECRET_ACCESS_KEY environment variable will be used. |
aws_session_token | String | | AWS Session token. If not provided, AWS_SESSION_TOKEN environment variable will be used. |
encryption_type | String | | (Default: 'none') The type of encryption to use. Current supported options are: 'none' and 'KMS'. |
encryption_key | String | | A reference to the encryption key to use for data encryption. For KMS encryption, this should be the name of the KMS encryption key ID (e.g. '1234abcd-1234-1234-1234-1234abcd1234'). This field is ignored if 'encryption_type' is none or blank. |
role_arn | String | | The ARN of the role to assume |
Test
Install the tools
pip install tox
Run pytest
tox -e py
Lint & Static typing validation
tox -e lint,static
Release
- Update the version number at the beginning of
target-s3-jsonl/target_s3_json/__init__.py
- Merge the changes PR into
main
- Release the new version in github
License
Apache License Version 2.0