@xenova/transformers - npm Package Compare versions

Comparing version 2.12.1 to 2.13.0

package.json

		{
		"name": "@xenova/transformers",
		"version": "2.12.1",
		"version": "2.13.0",
		"description": "State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!",
		@@ -5,0 +5,0 @@ "main": "./src/transformers.js",

README.md

		@@ -104,3 +104,3 @@
		<script type="module">
		import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.12.1';
		import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.13.0';
		</script>
		@@ -131,2 +131,3 @@ ```

		Check out the Transformers.js [template](https://huggingface.co/new-space?template=static-templates%2Ftransformers.js) on Hugging Face to get started in one click!

		@@ -138,3 +139,3 @@

		By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/transformers@2.12.1/dist/), which should work out-of-the-box. You can customize this as follows:
		By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/transformers@2.13.0/dist/), which should work out-of-the-box. You can customize this as follows:

		@@ -283,2 +284,3 @@
		1. [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
		1. [CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg) (from University of Göttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lüddecke and Alexander Ecker.
		1. [CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen) (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
		@@ -295,2 +297,3 @@ 1. [CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code) (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.
		1. [DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert) (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT.
		1. [DiT](https://huggingface.co/docs/transformers/model_doc/dit) (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
		1. [Donut](https://huggingface.co/docs/transformers/model_doc/donut) (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
		@@ -331,2 +334,5 @@ 1. [DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt) (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
		1. [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta) (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
		1. [RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer) (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
		1. [SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer) (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
		1. [SigLIP](https://huggingface.co/docs/transformers/main/model_doc/siglip) (from Google AI) released with the paper [Sigmoid Loss for Language Image Pre-Training](https://arxiv.org/abs/2303.15343) by Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer.
		1. [SpeechT5](https://huggingface.co/docs/transformers/model_doc/speecht5) (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https://arxiv.org/abs/2110.07205) by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
		@@ -338,5 +344,7 @@ 1. [SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert) (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
		1. [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
		1. [Table Transformer](https://huggingface.co/docs/transformers/model_doc/table-transformer) (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https://arxiv.org/abs/2110.00061) by Brandon Smock, Rohith Pesala, Robin Abraham.
		1. [TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr) (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
		1. [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit) (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
		1. [ViTMatte](https://huggingface.co/docs/transformers/model_doc/vitmatte) (from HUST-VL) released with the paper [ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers](https://arxiv.org/abs/2305.15272) by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang.
		1. [VITS](https://huggingface.co/docs/transformers/model_doc/vits) (from Kakao Enterprise) released with the paper [Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech](https://arxiv.org/abs/2106.06103) by Jaehyeon Kim, Jungil Kong, Juhee Son.
		1. [Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2) (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
		@@ -343,0 +351,0 @@ 1. [WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm) (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.

src/env.js

		@@ -32,3 +32,3 @@ /**

		const VERSION = '2.12.1';
		const VERSION = '2.13.0';

		@@ -35,0 +35,0 @@ // Check if various APIs are available (depends on environment)

types/env.d.ts

		@@ -22,3 +22,3 @@ export namespace env {
		declare const __dirname: any;
		declare const VERSION: "2.12.1";
		declare const VERSION: "2.13.0";
		declare const localModelPath: any;
		@@ -25,0 +25,0 @@ declare const FS_AVAILABLE: boolean;

types/pipelines.d.ts

		@@ -1081,2 +1081,12 @@ /**
		* ```
		*
		* Example: Multilingual speech generation with `Xenova/mms-tts-fra`. See [here](https://huggingface.co/models?pipeline_tag=text-to-speech&other=vits&sort=trending) for the full list of available languages (1107).
		* ```js
		* let synthesizer = await pipeline('text-to-speech', 'Xenova/mms-tts-fra');
		* let out = await synthesizer('Bonjour');
		* // {
		* // audio: Float32Array(23808) [-0.00037693005288019776, 0.0003325853613205254, ...],
		* // sampling_rate: 16000
		* // }
		* ```
		*/
		@@ -1114,2 +1124,12 @@ export class TextToAudioPipeline extends Pipeline {
		}): Promise<any>;
		_call_text_to_waveform(text_inputs: any): Promise<{
		audio: any;
		sampling_rate: any;
		}>;
		_call_text_to_spectrogram(text_inputs: any, { speaker_embeddings }: {
		speaker_embeddings: any;
		}): Promise<{
		audio: any;
		sampling_rate: any;
		}>;
		}
		@@ -1116,0 +1136,0 @@ /**

types/processors.d.ts

		@@ -109,2 +109,8 @@ declare const FeatureExtractor_base: new () => {
		/**
		* Rescale the image' pixel values by `this.rescale_factor`.
		* @param {Float32Array} pixelData The pixel data to rescale.
		* @returns {void}
		*/
		rescale(pixelData: Float32Array): void;
		/**
		* @typedef {object} PreprocessedImage
		@@ -146,2 +152,15 @@ * @property {HeightWidth} original_size The original size of the image.
		}
		export class SegformerFeatureExtractor extends ImageFeatureExtractor {
		/**
		* Converts the output of `SegformerForSemanticSegmentation` into semantic segmentation maps.
		* @param {*} outputs Raw outputs of the model.
		* @param {number[][]} [target_sizes=null] List of tuples corresponding to the requested final size
		* (height, width) of each prediction. If unset, predictions will not be resized.
		* @returns {{segmentation: Tensor; labels: number[]}[]} The semantic segmentation maps.
		*/
		post_process_semantic_segmentation(outputs: any, target_sizes?: number[][]): {
		segmentation: Tensor;
		labels: number[];
		}[];
		}
		export class BitImageProcessor extends ImageFeatureExtractor {
		@@ -157,2 +176,4 @@ }
		}
		export class SiglipImageProcessor extends ImageFeatureExtractor {
		}
		export class ConvNextFeatureExtractor extends ImageFeatureExtractor {
		@@ -164,2 +185,4 @@ }
		}
		export class ViTImageProcessor extends ImageFeatureExtractor {
		}
		export class MobileViTFeatureExtractor extends ImageFeatureExtractor {
		@@ -576,4 +599,6 @@ }
		ChineseCLIPFeatureExtractor: typeof ChineseCLIPFeatureExtractor;
		SiglipImageProcessor: typeof SiglipImageProcessor;
		ConvNextFeatureExtractor: typeof ConvNextFeatureExtractor;
		ConvNextImageProcessor: typeof ConvNextImageProcessor;
		SegformerFeatureExtractor: typeof SegformerFeatureExtractor;
		BitImageProcessor: typeof BitImageProcessor;
		@@ -588,2 +613,3 @@ DPTFeatureExtractor: typeof DPTFeatureExtractor;
		NougatImageProcessor: typeof NougatImageProcessor;
		ViTImageProcessor: typeof ViTImageProcessor;
		VitMatteImageProcessor: typeof VitMatteImageProcessor;
		@@ -590,0 +616,0 @@ SamImageProcessor: typeof SamImageProcessor;

types/tokenizers.d.ts

		@@ -139,3 +139,3 @@ /**
		* @param {string\|string[]} [options.text_pair=null] Optional second sequence to be encoded. If set, must be the same type as text.
		* @param {boolean} [options.padding=false] Whether to pad the input sequences.
		* @param {boolean\|'max_length'} [options.padding=false] Whether to pad the input sequences.
		* @param {boolean} [options.add_special_tokens=true] Whether or not to add the special tokens associated with the corresponding model.
		@@ -149,3 +149,3 @@ * @param {boolean} [options.truncation=null] Whether to truncate the input sequences.
		text_pair?: string \| string[];
		padding?: boolean;
		padding?: boolean \| 'max_length';
		add_special_tokens?: boolean;
		@@ -307,2 +307,4 @@ truncation?: boolean;
		}
		export class RoFormerTokenizer extends PreTrainedTokenizer {
		}
		export class DistilBertTokenizer extends PreTrainedTokenizer {
		@@ -521,2 +523,4 @@ }
		}
		export class SiglipTokenizer extends PreTrainedTokenizer {
		}
		/**
		@@ -549,2 +553,5 @@ * @todo This model is not yet supported by Hugging Face's "fast" tokenizers library (https://github.com/huggingface/tokenizers).
		}
		export class VitsTokenizer extends PreTrainedTokenizer {
		constructor(tokenizerJSON: any, tokenizerConfig: any);
		}
		/**
		@@ -567,2 +574,3 @@ * Helper class which is used to instantiate pretrained tokenizers with the `from_pretrained` function.
		ConvBertTokenizer: typeof ConvBertTokenizer;
		RoFormerTokenizer: typeof RoFormerTokenizer;
		XLMTokenizer: typeof XLMTokenizer;
		@@ -581,2 +589,3 @@ ElectraTokenizer: typeof ElectraTokenizer;
		CLIPTokenizer: typeof CLIPTokenizer;
		SiglipTokenizer: typeof SiglipTokenizer;
		MarianTokenizer: typeof MarianTokenizer;
		@@ -598,2 +607,3 @@ BloomTokenizer: typeof BloomTokenizer;
		NougatTokenizer: typeof NougatTokenizer;
		VitsTokenizer: typeof VitsTokenizer;
		PreTrainedTokenizer: typeof PreTrainedTokenizer;
		@@ -600,0 +610,0 @@ };

dist/transformers.js