New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

@tokenizer/s3

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@tokenizer/s3

Amazon S3 tokenizer

1.0.1
latest
Source
npm

Version published: 21 hours ago

Weekly downloads: 4.5K; decreased by-12.39%

Maintainers: 1

Weekly downloads

Created: 5 years ago

Source

@tokenizer/s3

The tokenizer-s3 module enables seamless integration with Amazon Web Services (AWS) S3, allowing you to read and tokenize data from S3 objects in a streaming fashion. This module extends the functionality of the strtok3 tokenizer by providing support for chunked S3 data access.

Features

Streaming Support: Efficiently read and tokenize data from Amazon S3 objects using streaming, which is ideal for handling large files without loading them entirely into memory. Integration with strtok3: Works seamlessly with the strtok3 tokenizer to process S3 data streams, making it easy to handle various tokenization tasks. Flexible Access: Provides options to configure S3 access, allowing for customized tokenization workflows based on your specific needs. Promise-Based API: Utilizes a promise-based API for easy integration into modern asynchronous workflows.

Installation

npm install @tokenizer/s3

If you appreciate my work and want to support the development of open-source projects like music-metadata, file-type, and listFix(), consider becoming a sponsor or making a small contribution. Your support helps sustain ongoing development and improvements. Become a sponsor to Borewit

API Documention

`makeChunkedTokenizerFromS3`

Initialize a tokenizer, with the option for random access, from an Amazon S3 client for use in extracting metadata from media files.

Function Signature

function makeChunkedTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise<IRandomAccessTokenizer>

Reads from the S3 as a stream.

Parameters

s3 (S3Client):

The S3 client used to make requests to Amazon S3.

[!NOTE] To configure AWS client authentication see Configuration and credential file settings.
objRequest (GetObjectRequest):

The S3 object request containing details about the S3 object to fetch. This includes properties like the bucket name and object key.
options (IS3Options, optional):

Returns

Promise<IRandomAccessTokenizer>:

A Promise that resolves to an instance of IRandomAccessTokenizer. This tokenizer can be used to extract metadata from the specified media file in the S3 object. It supports random access reads.

`makeStreamingTokenizerFromS3`

Initialize a tokenizer from an Amazon S3 client for use in extracting metadata from media files.

Function Signature

function makeStreamingTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise<ITokenizer>

Reads from the S3 as a stream.

Parameters

s3 (S3Client):

The S3 client used to make requests to Amazon S3.

[!NOTE] To configure AWS client authentication see Configuration and credential file settings.
objRequest (GetObjectRequest):

The S3 object request containing details about the S3 object to fetch. This includes properties like the bucket name and object key.

Returns

Promise<ITokenizer>:

A Promise that resolves to an instance of ITokenizer. This tokenizer can be used to extract metadata from the specified media file in the S3 object.

Compatibility

Module: version 0.3.0 migrated from CommonJS to pure ECMAScript Module (ESM). The distributed JavaScript codebase is compliant with the ECMAScript 2020 (11th Edition) standard.

This module requires a Node.js ≥ 16 engine. It can also be used in a browser environment when bundled with a module bundler.

For TypeScript CommonJs backward compatibility, you can use load-esm.

Examples

Determine S3 file type

Determine file type (based on it's content) from a file stored Amazon S3 cloud:

import { fileTypeFromTokenizer } from 'file-type';
import { fromEnv } from '@aws-sdk/credential-providers';
import { S3Client } from '@aws-sdk/client-s3';
import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';

(async () => {

  // Initialize S3 client
  const s3 = new S3Client({
    region: 'eu-west-2',
    credentials: fromEnv(),
  });

  // Initialize S3 tokenizer
  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, {
    Bucket: 'affectlab',
    Key: '1min_35sec.mp4'
  });

  // Figure out what kind of file it is
  const fileType = await fileTypeFromTokenizer(s3Tokenizer);
  console.log(fileType);
})();

Reading audio metadata from Amazon S3

Retrieve music-metadata

import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';
import { S3Client } from '@aws-sdk/client-s3';
import { parseFromTokenizer } from 'music-metadata/lib/core';

/**
 * Retrieve metadata from Amazon S3 object
 * @param objRequest S3 object request
 * @param options `tokenizer-s3` options
 * @return Metadata
 */
async function parseS3Object(s3, objRequest, options) {
  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, objRequest);
  return parseFromTokenizer(s3Tokenizer, options);
}

(async () => {
  const s3 = new S3Client({});

  const metadata = await parseS3Object(s3, {
    Bucket: 'standing0media',
    Key: '01 Where The Highway Takes Me.mp3'
  });

  console.log(metadata);
})();

A module implementation of this example can be found in @music-metadata/s3.

Dependency graph

dependency graph

Keywords

FAQs

What is @tokenizer/s3?

Is @tokenizer/s3 popular?

Is @tokenizer/s3 well maintained?

Package last updated on 31 Jan 2025

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@tokenizer/s3

@tokenizer/s3

Features

Installation

Sponsor

API Documention

makeChunkedTokenizerFromS3

Function Signature

Parameters

Returns

makeStreamingTokenizerFromS3

Function Signature

Parameters

Returns

Compatibility

Examples

Determine S3 file type

Reading audio metadata from Amazon S3

Dependency graph

Keywords

Related posts

PyPI’s New Archival Feature Closes a Major Security Gap

North Korean APT Lazarus Targets Developers with Malicious npm Package

`makeChunkedTokenizerFromS3`

`makeStreamingTokenizerFromS3`