@tokenizer/s3
The tokenizer-s3 module enables seamless integration with Amazon Web Services (AWS) S3, allowing you to read and tokenize data from S3 objects in a streaming fashion. This module extends the functionality of the strtok3 tokenizer by providing support for chunked S3 data access.
Features
Streaming Support: Efficiently read and tokenize data from Amazon S3 objects using streaming, which is ideal for handling large files without loading them entirely into memory.
Integration with strtok3: Works seamlessly with the strtok3 tokenizer to process S3 data streams, making it easy to handle various tokenization tasks.
Flexible Access: Provides options to configure S3 access, allowing for customized tokenization workflows based on your specific needs.
Promise-Based API: Utilizes a promise-based API for easy integration into modern asynchronous workflows.
Installation
npm install @tokenizer/s3
If you appreciate my work and want to support the development of open-source projects like music-metadata, file-type, and listFix(), consider becoming a sponsor or making a small contribution.
Your support helps sustain ongoing development and improvements.
Become a sponsor to Borewit
or
API Documention
makeChunkedTokenizerFromS3
Initialize a tokenizer, with the option for random access,
from an Amazon S3 client for use in extracting metadata from media files.
Function Signature
function makeChunkedTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise<IRandomAccessTokenizer>
Reads from the S3 as a stream.
Parameters
-
s3
(S3Client
):
The S3 client used to make requests to Amazon S3.
[!NOTE]
To configure AWS client authentication see Configuration and credential file settings.
-
objRequest
(GetObjectRequest
):
The S3 object request containing details about the S3 object to fetch.
This includes properties like the bucket name and object key.
-
options
(IS3Options
, optional):
Returns
-
Promise<IRandomAccessTokenizer>
:
A Promise that resolves to an instance of IRandomAccessTokenizer
.
This tokenizer can be used to extract metadata from the specified media file in the S3 object.
It supports random access reads.
makeStreamingTokenizerFromS3
Initialize a tokenizer from an Amazon S3 client for use in extracting metadata from media files.
Function Signature
function makeStreamingTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise<ITokenizer>
Reads from the S3 as a stream.
Parameters
-
s3
(S3Client
):
The S3 client used to make requests to Amazon S3.
[!NOTE]
To configure AWS client authentication see Configuration and credential file settings.
-
objRequest
(GetObjectRequest
):
The S3 object request containing details about the S3 object to fetch.
This includes properties like the bucket name and object key.
Returns
Compatibility
Module: version 0.3.0 migrated from CommonJS to pure ECMAScript Module (ESM).
The distributed JavaScript codebase is compliant with the ECMAScript 2020 (11th Edition) standard.
This module requires a Node.js ≥ 16 engine.
It can also be used in a browser environment when bundled with a module bundler.
Examples
Determine S3 file type
Determine file type (based on it's content) from a file stored Amazon S3 cloud:
import { fileTypeFromTokenizer } from 'file-type';
import { fromEnv } from '@aws-sdk/credential-providers';
import { S3Client } from '@aws-sdk/client-s3';
import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';
(async () => {
const s3 = new S3Client({
region: 'eu-west-2',
credentials: fromEnv(),
});
const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, {
Bucket: 'affectlab',
Key: '1min_35sec.mp4'
});
const fileType = await fileTypeFromTokenizer(s3Tokenizer);
console.log(fileType);
})();
See also example at file-type.
Reading audio metadata from Amazon S3
Retrieve music-metadata
import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';
import { S3Client } from '@aws-sdk/client-s3';
import { parseFromTokenizer } from 'music-metadata/lib/core';
async function parseS3Object(s3, objRequest, options) {
const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, objRequest);
return parseFromTokenizer(s3Tokenizer, options);
}
(async () => {
const s3 = new S3Client({});
const metadata = await parseS3Object(s3, {
Bucket: 'standing0media',
Key: '01 Where The Highway Takes Me.mp3'
});
console.log(metadata);
})();
A module implementation of this example can be found in @music-metadata/s3.
Dependency graph