@xenova/transformers
Advanced tools
Comparing version 2.6.2 to 2.7.0
{ | ||
"name": "@xenova/transformers", | ||
"version": "2.6.2", | ||
"version": "2.7.0", | ||
"description": "State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!", | ||
@@ -5,0 +5,0 @@ "main": "./src/transformers.js", |
@@ -101,3 +101,3 @@ | ||
<script type="module"> | ||
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.6.2'; | ||
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.7.0'; | ||
</script> | ||
@@ -111,16 +111,17 @@ ``` | ||
| Name | Description | Source code | | ||
| Name | Description | Links | | ||
|-------------------|----------------------------------|-------------------------------| | ||
| Whisper Web | Speech recognition w/ Whisper | [link](https://github.com/xenova/whisper-web) | | ||
| Doodle Dash | Real-time sketch-recognition game (see [blog](https://huggingface.co/blog/ml-web-games)) | [link](https://github.com/xenova/doodle-dash) | | ||
| Code Playground | In-browser code completion website | [link](./examples/code-completion/) | | ||
| Semantic Image Search (client-side) | Search for images with text | [link](./examples/semantic-image-search-client/) | | ||
| Semantic Image Search (server-side) | Search for images with text (Supabase) | [link](./examples/semantic-image-search/) | | ||
| Vanilla JavaScript | In-browser object detection | [link](./examples/vanilla-js/) | | ||
| React | Multilingual translation website | [link](./examples/react-translator/) | | ||
| Browser extension | Text classification extension | [link](./examples/extension/) | | ||
| Electron | Text classification application | [link](./examples/electron/) | | ||
| Next.js (client-side) | Sentiment analysis (in-browser inference) | [link](./examples/next-client/) | | ||
| Next.js (server-side) | Sentiment analysis (Node.js inference) | [link](./examples/next-server/) | | ||
| Node.js | Sentiment analysis API | [link](./examples/node/) | | ||
| Whisper Web | Speech recognition w/ Whisper | [code](https://github.com/xenova/whisper-web), [demo](https://huggingface.co/spaces/Xenova/whisper-web) | | ||
| Doodle Dash | Real-time sketch-recognition game | [blog](https://huggingface.co/blog/ml-web-games), [code](https://github.com/xenova/doodle-dash), [demo](https://huggingface.co/spaces/Xenova/doodle-dash) | | ||
| Code Playground | In-browser code completion website | [code](./examples/code-completion/), [demo](https://huggingface.co/spaces/Xenova/ai-code-playground) | | ||
| Semantic Image Search (client-side) | Search for images with text | [code](./examples/semantic-image-search-client/), [demo](https://huggingface.co/spaces/Xenova/semantic-image-search-client) | | ||
| Semantic Image Search (server-side) | Search for images with text (Supabase) | [code](./examples/semantic-image-search/), [demo](https://huggingface.co/spaces/Xenova/semantic-image-search) | | ||
| Vanilla JavaScript | In-browser object detection | [video](https://scrimba.com/scrim/cKm9bDAg), [code](./examples/vanilla-js/), [demo](https://huggingface.co/spaces/Scrimba/vanilla-js-object-detector) | | ||
| React | Multilingual translation website | [code](./examples/react-translator/), [demo](https://huggingface.co/spaces/Xenova/react-translator) | | ||
| Text to speech (client-side) | In-browser speech synthesis | [code](./examples/text-to-speech-client/), [demo](https://huggingface.co/spaces/Xenova/text-to-speech-client) | | ||
| Browser extension | Text classification extension | [code](./examples/extension/) | | ||
| Electron | Text classification application | [code](./examples/electron/) | | ||
| Next.js (client-side) | Sentiment analysis (in-browser inference) | [code](./examples/next-client/), [demo](https://huggingface.co/spaces/Xenova/next-example-app) | | ||
| Next.js (server-side) | Sentiment analysis (Node.js inference) | [code](./examples/next-server/), [demo](https://huggingface.co/spaces/Xenova/next-server-example-app) | | ||
| Node.js | Sentiment analysis API | [code](./examples/node/) | | ||
@@ -132,3 +133,3 @@ | ||
By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/transformers@2.6.2/dist/), which should work out-of-the-box. You can customize this as follows: | ||
By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/transformers@2.7.0/dist/), which should work out-of-the-box. You can customize this as follows: | ||
@@ -228,3 +229,3 @@ | ||
| [Automatic Speech Recognition](https://huggingface.co/tasks/automatic-speech-recognition) | `automatic-speech-recognition` | Transcribing a given audio into text. | ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.AutomaticSpeechRecognitionPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=transformers.js) | | ||
| [Text-to-Speech](https://huggingface.co/tasks/text-to-speech) | n/a | Generating natural-sounding speech given text input. | ❌ | | ||
| [Text-to-Speech](https://huggingface.co/tasks/text-to-speech) | `text-to-speech` or `text-to-audio` | | Generating natural-sounding speech given text input. | ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.TextToAudioPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=text-to-audio&library=transformers.js) | | ||
@@ -303,2 +304,3 @@ | ||
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. | ||
1. **[SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei. | ||
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. | ||
@@ -305,0 +307,0 @@ 1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. |
@@ -32,3 +32,3 @@ /** | ||
const VERSION = '2.6.2'; | ||
const VERSION = '2.7.0'; | ||
@@ -35,0 +35,0 @@ // Check if various APIs are available (depends on environment) |
@@ -1312,2 +1312,4 @@ | ||
export class SpeechT5FeatureExtractor extends FeatureExtractor { } | ||
/** | ||
@@ -1385,2 +1387,14 @@ * Represents a Processor that extracts features from an input. | ||
export class SpeechT5Processor extends Processor { | ||
/** | ||
* Calls the feature_extractor function with the given input. | ||
* @param {any} input The input to extract features from. | ||
* @returns {Promise<any>} A Promise that resolves with the extracted features. | ||
*/ | ||
async _call(input) { | ||
return await this.feature_extractor(input) | ||
} | ||
} | ||
////////////////////////////////////////////////// | ||
@@ -1431,2 +1445,3 @@ /** | ||
Wav2Vec2FeatureExtractor, | ||
SpeechT5FeatureExtractor, | ||
} | ||
@@ -1438,2 +1453,3 @@ | ||
SamProcessor, | ||
SpeechT5Processor, | ||
} | ||
@@ -1440,0 +1456,0 @@ |
@@ -101,3 +101,3 @@ | ||
* let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg'); | ||
* // test { | ||
* // RawImage { | ||
* // "data": Uint8ClampedArray [ 25, 25, 25, 19, 19, 19, ... ], | ||
@@ -104,0 +104,0 @@ * // "width": 800, |
@@ -991,1 +991,23 @@ /** | ||
} | ||
/** | ||
* Returns a tensor filled with the scalar value 1, with the shape defined by the variable argument size. | ||
* @param {number[]} size A sequence of integers defining the shape of the output tensor. | ||
*/ | ||
export function ones(size) { | ||
const numElements = size.reduce((a, b) => a * b, 1); | ||
return new Tensor( | ||
'int64', | ||
new BigInt64Array(numElements).fill(1n), | ||
size | ||
) | ||
} | ||
/** | ||
* Returns a tensor filled with the scalar value 1, with the same size as input. | ||
* @param {Tensor} tensor The size of input will determine size of the output tensor. | ||
* @returns The ones tensor. | ||
*/ | ||
export function ones_like(tensor) { | ||
return ones(tensor.dims); | ||
} |
@@ -22,3 +22,3 @@ export namespace env { | ||
declare const __dirname: any; | ||
declare const VERSION: "2.6.2"; | ||
declare const VERSION: "2.7.0"; | ||
declare const localModelPath: any; | ||
@@ -25,0 +25,0 @@ declare const FS_AVAILABLE: boolean; |
@@ -634,4 +634,3 @@ /** | ||
/** | ||
* @typedef {import('./utils/tensor.js').Tensor} Tensor | ||
* @typedef {{stride: number[], input_features: Tensor, is_last: boolean, tokens?: number[], token_timestamps?: number[]}} Chunk | ||
* @typedef {{stride: number[], input_features: import('./utils/tensor.js').Tensor, is_last: boolean, tokens?: number[], token_timestamps?: number[]}} Chunk | ||
* | ||
@@ -662,3 +661,3 @@ * @callback ChunkCallback | ||
stride: number[]; | ||
input_features: import("./utils/tensor.js").Tensor; | ||
input_features: import('./utils/tensor.js').Tensor; | ||
is_last: boolean; | ||
@@ -926,2 +925,59 @@ tokens?: number[]; | ||
} | ||
/** | ||
* Text-to-audio generation pipeline using any `AutoModelForTextToWaveform` or `AutoModelForTextToSpectrogram`. | ||
* This pipeline generates an audio file from an input text and optional other conditional inputs. | ||
* | ||
* **Example:** Generate audio from text with `Xenova/speecht5_tts`. | ||
* ```js | ||
* let speaker_embeddings = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speaker_embeddings.bin'; | ||
* let synthesizer = await pipeline('text-to-speech', 'Xenova/speecht5_tts', { quantized: false }); | ||
* let out = await synthesizer('Hello, my dog is cute', { speaker_embeddings }); | ||
* // { | ||
* // audio: Float32Array(26112) [-0.00005657337896991521, 0.00020583874720614403, ...], | ||
* // sampling_rate: 16000 | ||
* // } | ||
* ``` | ||
* | ||
* You can then save the audio to a .wav file with the `wavefile` package: | ||
* ```js | ||
* import wavefile from 'wavefile'; | ||
* import fs from 'fs'; | ||
* | ||
* let wav = new wavefile.WaveFile(); | ||
* wav.fromScratch(1, out.sampling_rate, '32f', out.audio); | ||
* fs.writeFileSync('out.wav', wav.toBuffer()); | ||
* ``` | ||
*/ | ||
export class TextToAudioPipeline extends Pipeline { | ||
/** | ||
* Create a new TextToAudioPipeline. | ||
* @param {Object} options An object containing the following properties: | ||
* @param {string} [options.task] The task of the pipeline. Useful for specifying subtasks. | ||
* @param {PreTrainedModel} [options.model] The model to use. | ||
* @param {PreTrainedTokenizer} [options.tokenizer] The tokenizer to use. | ||
* @param {Processor} [options.processor] The processor to use. | ||
* @param {PreTrainedModel} [options.vocoder] The vocoder to use. | ||
*/ | ||
constructor(options: { | ||
task?: string; | ||
model?: PreTrainedModel; | ||
tokenizer?: PreTrainedTokenizer; | ||
processor?: Processor; | ||
vocoder?: PreTrainedModel; | ||
}); | ||
DEFAULT_VOCODER_ID: string; | ||
vocoder: PreTrainedModel; | ||
/** | ||
* Generates speech/audio from the inputs. | ||
* @param {string|string[]} text_inputs The text(s) to generate. | ||
* @param {Object} options Parameters passed to the model generation/forward method. | ||
* @param {PreTrainedModel} [options.vocoder=null] The vocoder to use (if the model uses one). If not provided, use the default HifiGan vocoder. | ||
* @param {Tensor|Float32Array|string|URL} [options.speaker_embeddings=null] | ||
* @returns {Promise<Object>} An object containing the generated audio and sampling rate. | ||
*/ | ||
_call(text_inputs: string | string[], { speaker_embeddings, }?: { | ||
vocoder?: PreTrainedModel; | ||
speaker_embeddings?: Tensor | Float32Array | string | URL; | ||
}): Promise<any>; | ||
} | ||
export type QuestionAnsweringResult = { | ||
@@ -941,3 +997,4 @@ /** | ||
import { Processor } from './processors.js'; | ||
import { Tensor } from './utils/tensor.js'; | ||
export {}; | ||
//# sourceMappingURL=pipelines.d.ts.map |
@@ -344,2 +344,4 @@ declare const FeatureExtractor_base: new () => { | ||
} | ||
export class SpeechT5FeatureExtractor extends FeatureExtractor { | ||
} | ||
declare const Processor_base: new () => { | ||
@@ -400,2 +402,10 @@ (...args: any[]): any; | ||
} | ||
export class SpeechT5Processor extends Processor { | ||
/** | ||
* Calls the feature_extractor function with the given input. | ||
* @param {any} input The input to extract features from. | ||
* @returns {Promise<any>} A Promise that resolves with the extracted features. | ||
*/ | ||
_call(input: any): Promise<any>; | ||
} | ||
/** | ||
@@ -444,2 +454,3 @@ * Helper class which is used to instantiate pretrained processors with the `from_pretrained` function. | ||
Wav2Vec2FeatureExtractor: typeof Wav2Vec2FeatureExtractor; | ||
SpeechT5FeatureExtractor: typeof SpeechT5FeatureExtractor; | ||
}; | ||
@@ -450,2 +461,3 @@ static PROCESSOR_CLASS_MAPPING: { | ||
SamProcessor: typeof SamProcessor; | ||
SpeechT5Processor: typeof SpeechT5Processor; | ||
}; | ||
@@ -452,0 +464,0 @@ /** |
@@ -446,2 +446,4 @@ declare const TokenizerModel_base: new () => { | ||
} | ||
export class SpeechT5Tokenizer extends PreTrainedTokenizer { | ||
} | ||
/** | ||
@@ -488,2 +490,3 @@ * Helper class which is used to instantiate pretrained tokenizers with the `from_pretrained` function. | ||
BlenderbotSmallTokenizer: typeof BlenderbotSmallTokenizer; | ||
SpeechT5Tokenizer: typeof SpeechT5Tokenizer; | ||
PreTrainedTokenizer: typeof PreTrainedTokenizer; | ||
@@ -490,0 +493,0 @@ }; |
@@ -10,3 +10,3 @@ export class RawImage { | ||
* let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg'); | ||
* // test { | ||
* // RawImage { | ||
* // "data": Uint8ClampedArray [ 25, 25, 25, 19, 19, 19, ... ], | ||
@@ -13,0 +13,0 @@ * // "width": 800, |
@@ -63,2 +63,13 @@ /** | ||
export function dynamicTimeWarping(matrix: Tensor): number[][]; | ||
/** | ||
* Returns a tensor filled with the scalar value 1, with the shape defined by the variable argument size. | ||
* @param {number[]} size A sequence of integers defining the shape of the output tensor. | ||
*/ | ||
export function ones(size: number[]): Tensor; | ||
/** | ||
* Returns a tensor filled with the scalar value 1, with the same size as input. | ||
* @param {Tensor} tensor The size of input will determine size of the output tensor. | ||
* @returns The ones tensor. | ||
*/ | ||
export function ones_like(tensor: Tensor): Tensor; | ||
declare const Tensor_base: any; | ||
@@ -65,0 +76,0 @@ export class Tensor extends Tensor_base { |
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
44999831
42811
314
21