@xenova/transformers - npm Package Compare versions

4

package.json

		{
		"name": "@xenova/transformers",
		"version": "2.7.0",
		"version": "2.8.0",
		"description": "State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!",
		@@ -54,3 +54,3 @@ "main": "./src/transformers.js",
		"jsdoc-to-markdown": "^8.0.0",
		"typescript": "^5.0.2",
		"typescript": "^5.2.2",
		"wavefile": "^11.0.0",
		@@ -57,0 +57,0 @@ "webpack": "^5.80.0",

12

README.md

		@@ -101,3 +101,3 @@
		<script type="module">
		import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.7.0';
		import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.8.0';
		</script>
		@@ -132,3 +132,3 @@ ```

		By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/transformers@2.7.0/dist/), which should work out-of-the-box. You can customize this as follows:
		By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/transformers@2.8.0/dist/), which should work out-of-the-box. You can customize this as follows:

		@@ -215,3 +215,3 @@
		\| [Image Segmentation](https://huggingface.co/tasks/image-segmentation) \| `image-segmentation` \| Divides an image into segments where each pixel is mapped to an object. This task has multiple variants such as instance segmentation, panoptic segmentation and semantic segmentation. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.ImageSegmentationPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=image-segmentation&library=transformers.js) \|
		\| [Image-to-Image](https://huggingface.co/tasks/image-to-image) \| `image-to-image` \| Transforming a source image to match the characteristics of a target image or a target image domain. \| ❌ \|
		\| [Image-to-Image](https://huggingface.co/tasks/image-to-image) \| `image-to-image` \| Transforming a source image to match the characteristics of a target image or a target image domain. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.ImageToImagePipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=image-to-image&library=transformers.js) \|
		\| [Mask Generation](https://huggingface.co/tasks/mask-generation) \| `mask-generation` \| Generate masks for the objects in an image. \| ❌ \|
		@@ -229,3 +229,3 @@ \| [Object Detection](https://huggingface.co/tasks/object-detection) \| `object-detection` \| Identify objects of certain defined classes within an image. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.ObjectDetectionPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=object-detection&library=transformers.js) \|
		\| [Automatic Speech Recognition](https://huggingface.co/tasks/automatic-speech-recognition) \| `automatic-speech-recognition` \| Transcribing a given audio into text. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.AutomaticSpeechRecognitionPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=transformers.js) \|
		\| [Text-to-Speech](https://huggingface.co/tasks/text-to-speech) \| `text-to-speech` or `text-to-audio` \| \| Generating natural-sounding speech given text input. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.TextToAudioPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=text-to-audio&library=transformers.js) \|
		\| [Text-to-Speech](https://huggingface.co/tasks/text-to-speech) \| `text-to-speech` or `text-to-audio` \| Generating natural-sounding speech given text input. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.TextToAudioPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=text-to-audio&library=transformers.js) \|

		@@ -280,2 +280,3 @@
		1. [Donut](https://huggingface.co/docs/transformers/model_doc/donut) (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
		1. [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon) (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.
		1. [FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
		@@ -295,2 +296,3 @@ 1. [GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo) (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
		1. [mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart) (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
		1. [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral) (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
		1. [MMS](https://huggingface.co/docs/transformers/model_doc/mms) (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.
		@@ -309,4 +311,6 @@ 1. [MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert) (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
		1. [Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin) (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
		1. [Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr) (from University of Würzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte.
		1. [T5](https://huggingface.co/docs/transformers/model_doc/t5) (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
		1. [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
		1. [TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr) (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
		1. [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit) (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
		@@ -313,0 +317,0 @@ 1. [Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2) (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.

2

src/env.js

		@@ -32,3 +32,3 @@ /**

		const VERSION = '2.7.0';
		const VERSION = '2.8.0';

		@@ -35,0 +35,0 @@ // Check if various APIs are available (depends on environment)

160

src/processors.js

		@@ -25,2 +25,3 @@
		calculateDimensions,
		calculateReflectOffset,
		} from './utils/core.js';
		@@ -233,3 +234,87 @@


		/**
		* Pad the image by a certain amount.
		* @param {Float32Array} pixelData The pixel data to pad.
		* @param {number[]} imgDims The dimensions of the image.
		* @param {{width:number; height:number}\|number} padSize The dimensions of the padded image.
		* @param {Object} options The options for padding.
		* @param {'constant'\|'symmetric'} [options.mode='constant'] The type of padding to add.
		* @param {boolean} [options.center=false] Whether to center the image.
		* @param {number} [options.constant_values=0] The constant value to use for padding.
		* @returns {[Float32Array, number[]]} The padded pixel data and image dimensions.
		*/
		pad_image(pixelData, imgDims, padSize, {
		mode = 'constant',
		center = false,
		constant_values = 0,
		} = {}) {
		const [imageWidth, imageHeight, imageChannels] = imgDims;

		let paddedImageWidth, paddedImageHeight;
		if (typeof padSize === 'number') {
		paddedImageWidth = padSize;
		paddedImageHeight = padSize;
		} else {
		paddedImageWidth = padSize.width;
		paddedImageHeight = padSize.height;
		}

		// Only add padding if there is a difference in size
		if (paddedImageWidth !== imageWidth \|\| paddedImageHeight !== imageHeight) {
		const paddedPixelData = new Float32Array(paddedImageWidth * paddedImageHeight * imageChannels);
		if (constant_values !== 0) {
		paddedPixelData.fill(constant_values);
		}

		const [left, top] = center
		? [Math.floor((paddedImageWidth - imageWidth) / 2), Math.floor((paddedImageHeight - imageHeight) / 2)]
		: [0, 0];

		// Copy the original image into the padded image
		for (let i = 0; i < imageHeight; ++i) {
		const a = (i + top) * paddedImageWidth;
		const b = i * imageWidth;
		for (let j = 0; j < imageWidth; ++j) {
		const c = (a + j + left) * imageChannels;
		const d = (b + j) * imageChannels;
		for (let k = 0; k < imageChannels; ++k) {
		paddedPixelData[c + k] = pixelData[d + k];
		}
		}
		}

		if (mode === 'symmetric') {
		if (center) {
		throw new Error('`center` padding is not supported when `mode` is set to `symmetric`.');
		// TODO: Implement this
		}
		const h1 = imageHeight - 1;
		const w1 = imageWidth - 1;
		for (let i = 0; i < paddedImageHeight; ++i) {
		const a = i * paddedImageWidth;
		const b = calculateReflectOffset(i, h1) * imageWidth;

		for (let j = 0; j < paddedImageWidth; ++j) {
		if (i < imageHeight && j < imageWidth) continue; // Do not overwrite original image
		const c = (a + j) * imageChannels;
		const d = (b + calculateReflectOffset(j, w1)) * imageChannels;

		// Copy channel-wise
		for (let k = 0; k < imageChannels; ++k) {
		paddedPixelData[c + k] = pixelData[d + k];
		}
		}
		}
		}


		// Update pixel data and image dimensions
		pixelData = paddedPixelData;
		imgDims = [paddedImageHeight, paddedImageWidth, imageChannels]
		}
		return [pixelData, imgDims];
		}

		/**
		* @typedef {object} PreprocessedImage
		@@ -344,14 +429,5 @@ * @property {HeightWidth} original_size The original size of the image.

		// TODO is it okay to pad before rescaling/normalizing?
		if (this.do_pad && this.pad_size) {
		let left = 0;
		let right = this.pad_size.width - image.width;
		let top = 0;
		let bottom = this.pad_size.height - image.height;
		let pixelData = Float32Array.from(image.data);
		let imgDims = [image.height, image.width, image.channels];

		image = await image.pad([left, right, top, bottom]);
		}

		const pixelData = Float32Array.from(image.data);

		if (this.do_rescale) {
		@@ -385,6 +461,13 @@ for (let i = 0; i < pixelData.length; ++i) {

		// do padding after rescaling/normalizing
		if (this.do_pad && this.pad_size) {
		const padded = this.pad_image(pixelData, [image.width, image.height, image.channels], this.pad_size);
		[pixelData, imgDims] = padded; // Update pixel data and image dimensions
		}

		// Create HWC tensor
		const img = new Tensor('float32', pixelData, imgDims);

		// convert to channel dimension format:
		let imgDims = [image.height, image.width, image.channels];
		let img = new Tensor('float32', pixelData, imgDims);
		let transposed = transpose(img, [2, 0, 1]); // hwc -> chw
		const transposed = transpose(img, [2, 0, 1]); // hwc -> chw

		@@ -438,4 +521,16 @@ return {
		export class BeitFeatureExtractor extends ImageFeatureExtractor { }
		export class DonutFeatureExtractor extends ImageFeatureExtractor { }
		export class DonutFeatureExtractor extends ImageFeatureExtractor {
		pad_image(pixelData, imgDims, padSize, options = {}) {
		return super.pad_image(pixelData, imgDims, padSize, {
		center: true,

		// Since normalization is done after padding, we need to pad with -1.
		// NOTE: This only works if `image_mean = 0.5` and `image_std = 0.5`.
		// For more information, see https://github.com/huggingface/transformers/blob/main/src/transformers/models/donut/image_processing_donut.py#L433-L451
		constant_values: -1,
		...options,
		});
		}
		}

		/**
		@@ -916,3 +1011,23 @@ * @typedef {object} DetrFeatureExtractorResultProps

		export class Swin2SRImageProcessor extends ImageFeatureExtractor {
		pad_image(pixelData, imgDims, padSize, options = {}) {
		// NOTE: In this case, `padSize` represents the size of the sliding window for the local attention.
		// In other words, the image is padded so that its width and height are multiples of `padSize`.
		const [imageWidth, imageHeight, imageChannels] = imgDims;

		return super.pad_image(pixelData, imgDims, {
		// NOTE: For Swin2SR models, the original python implementation adds padding even when the image's width/height is already
		// a multiple of `pad_size`. However, this is most likely a bug (PR: https://github.com/mv-lab/swin2sr/pull/19).
		// For this reason, we only add padding when the image's width/height is not a multiple of `pad_size`.
		width: imageWidth + (padSize - imageWidth % padSize) % padSize,
		height: imageHeight + (padSize - imageHeight % padSize) % padSize,
		}, {
		mode: 'symmetric',
		center: false,
		constant_values: -1,
		...options,
		})
		}
		}

		export class WhisperFeatureExtractor extends FeatureExtractor {
		@@ -926,12 +1041,4 @@
		}
		/**
		* Calculates the index offset for a given index and window size.
		* @param {number} i The index.
		* @param {number} w The window size.
		* @returns {number} The index offset.
		*/
		calcOffset(i, w) {
		return Math.abs((i + w) % (2 * w) - w);
		}


		/**
		@@ -953,7 +1060,7 @@ * Pads an array with a reflected version of itself on both ends.
		for (let i = 1; i <= left; ++i) {
		padded[left - i] = array[this.calcOffset(i, w)];
		padded[left - i] = array[calculateReflectOffset(i, w)];
		}

		for (let i = 1; i <= right; ++i) {
		padded[w + left + i] = array[this.calcOffset(w - i, w)];
		padded[w + left + i] = array[calculateReflectOffset(w - i, w)];
		}
		@@ -1450,2 +1557,3 @@
		SamImageProcessor,
		Swin2SRImageProcessor,
		Wav2Vec2FeatureExtractor,
		@@ -1452,0 +1560,0 @@ SpeechT5FeatureExtractor,

14

src/utils/core.js

		@@ -14,3 +14,3 @@
		*
		* @param {function} progress_callback The progress callback function to dispatch.
		* @param {Function} progress_callback The progress callback function to dispatch.
		* @param {any} data The data to pass to the progress callback function.
		@@ -21,3 +21,3 @@ * @returns {void}
		export function dispatchCallback(progress_callback, data) {
		if (progress_callback !== null) progress_callback(data);
		if (progress_callback) progress_callback(data);
		}
		@@ -179,1 +179,11 @@
		}

		/**
		* Calculates the index offset for a given index and window size.
		* @param {number} i The index.
		* @param {number} w The window size.
		* @returns {number} The index offset.
		*/
		export function calculateReflectOffset(i, w) {
		return Math.abs((i + w) % (2 * w) - w);
		}

45

src/utils/hub.js

		@@ -427,2 +427,4 @@

		const cacheHit = response !== undefined;

		if (response === undefined) {
		@@ -494,10 +496,40 @@ // Caching not available, or file is not cached, so we perform the request

		const buffer = await readResponse(response, data => {
		const progressInfo = {
		status: 'progress',
		name: path_or_repo_id,
		file: filename
		}

		/** @type {Uint8Array} */
		let buffer;

		if (!options.progress_callback) {
		// If no progress callback is specified, we can use the `.arrayBuffer()`
		// method to read the response.
		buffer = new Uint8Array(await response.arrayBuffer());

		} else if (
		cacheHit // The item is being read from the cache
		&&
		typeof navigator !== 'undefined' && /firefox/i.test(navigator.userAgent) // We are in Firefox
		) {
		// Due to bug in Firefox, we cannot display progress when loading from cache.
		// Fortunately, since this should be instantaneous, this should not impact users too much.
		buffer = new Uint8Array(await response.arrayBuffer());

		// For completeness, we still fire the final progress callback
		dispatchCallback(options.progress_callback, {
		status: 'progress',
		...data,
		name: path_or_repo_id,
		file: filename
		...progressInfo,
		progress: 100,
		loaded: buffer.length,
		total: buffer.length,
		})
		})
		} else {
		buffer = await readResponse(response, data => {
		dispatchCallback(options.progress_callback, {
		...progressInfo,
		...data,
		})
		})
		}

		@@ -564,3 +596,2 @@ if (
		async function readResponse(response, progress_callback) {
		// Read and track progress when reading a Response object

		@@ -567,0 +598,0 @@ const contentLength = response.headers.get('Content-Length');

37

src/utils/image.js

		@@ -67,15 +67,15 @@

		/**
		* Mapping from file extensions to MIME types.
		*/
		const CONTENT_TYPE_MAP = new Map([
		['png', 'image/png'],
		['jpg', 'image/jpeg'],
		['jpeg', 'image/jpeg'],
		['gif', 'image/gif'],
		]);

		export class RawImage {

		/**
		* Mapping from file extensions to MIME types.
		*/
		_CONTENT_TYPE_MAP = {
		'png': 'image/png',
		'jpg': 'image/jpeg',
		'jpeg': 'image/jpeg',
		'gif': 'image/gif',
		}

		/**
		* Create a new `RawImage` object.
		@@ -161,2 +161,17 @@ * @param {Uint8ClampedArray} data The pixel data.
		/**
		* Helper method to create a new Image from a tensor
		* @param {import('./tensor.js').Tensor} tensor
		*/
		static fromTensor(tensor, channel_format = 'CHW') {
		if (channel_format === 'CHW') {
		tensor = tensor.transpose(1, 2, 0);
		} else if (channel_format === 'HWC') {
		// Do nothing
		} else {
		throw new Error(`Unsupported channel format: ${channel_format}`);
		}
		return new RawImage(tensor.data, tensor.dims[1], tensor.dims[0], tensor.dims[2]);
		}

		/**
		* Convert the image to grayscale format.
		@@ -569,3 +584,3 @@ * @returns {RawImage} `this` to support chaining.
		const extension = path.split('.').pop().toLowerCase();
		const mime = this._CONTENT_TYPE_MAP[extension] ?? 'image/png';
		const mime = CONTENT_TYPE_MAP.get(extension) ?? 'image/png';

		@@ -572,0 +587,0 @@ // Convert image to canvas

112

src/utils/tensor.js

		@@ -18,2 +18,17 @@ /**

		// @ts-ignore
		const DataTypeMap = new Map([
		['bool', Uint8Array],
		['float32', Float32Array],
		['float64', Float64Array],
		['string', Array], // string[]
		['int8', Int8Array],
		['uint8', Uint8Array],
		['int16', Int16Array],
		['uint16', Uint16Array],
		['int32', Int32Array],
		['uint32', Uint32Array],
		['int64', BigInt64Array],
		])

		/**
		@@ -164,2 +179,44 @@ * @typedef {import('./maths.js').AnyTypedArray \| any[]} DataArray

		/**
		* Return a new Tensor with every element multiplied by a constant.
		* @param {number} val The value to multiply by.
		* @returns {Tensor} The new tensor.
		*/
		mul(val) {
		return this.clone().mul_(val);
		}

		/**
		* Multiply the tensor by a constant in place.
		* @param {number} val The value to multiply by.
		* @returns {Tensor} Returns `this`.
		*/
		mul_(val) {
		for (let i = 0; i < this.data.length; ++i) {
		this.data[i] *= val;
		}
		return this;
		}


		/**
		* Return a new Tensor with every element added by a constant.
		* @param {number} val The value to add by.
		* @returns {Tensor} The new tensor.
		*/
		add(val) {
		return this.clone().add_(val);
		}

		/**
		* Add the tensor by a constant in place.
		* @param {number} val The value to add by.
		* @returns {Tensor} Returns `this`.
		*/
		add_(val) {
		for (let i = 0; i < this.data.length; ++i) {
		this.data[i] += val;
		}
		return this;
		}
		clone() {
		@@ -487,2 +544,57 @@ return new Tensor(this.type, this.data.slice(), this.dims.slice());
		}

		/**
		* In-place version of @see {@link Tensor.clamp}
		*/
		clamp_(min, max) {
		for (let i = 0; i < this.data.length; ++i) {
		this.data[i] = Math.min(Math.max(this.data[i], min), max);
		}
		return this;
		}

		/**
		* Clamps all elements in input into the range [ min, max ]
		* @param {number} min lower-bound of the range to be clamped to
		* @param {number} max upper-bound of the range to be clamped to
		* @returns the output tensor.
		*/
		clamp(min, max) {
		return this.clone().clamp_(min, max);
		}

		/**
		* In-place version of @see {@link Tensor.round}
		*/
		round_() {
		for (let i = 0; i < this.data.length; ++i) {
		this.data[i] = Math.round(this.data[i]);
		}
		return this;
		}

		/**
		* Rounds elements of input to the nearest integer.
		* @returns the output tensor.
		*/
		round() {
		return this.clone().round_();
		}

		/**
		* Performs Tensor dtype conversion.
		* @param {'bool'\|'float32'\|'float64'\|'string'\|'int8'\|'uint8'\|'int16'\|'uint16'\|'int32'\|'uint32'\|'int64'} type
		* @returns {Tensor} The converted tensor.
		*/
		to(type) {
		// If the self Tensor already has the correct dtype, then self is returned.
		if (this.type === type) return this;

		// Otherwise, the returned tensor is a copy of self with the desired dtype.
		const ArrayConstructor = DataTypeMap.get(type);
		if (!ArrayConstructor) {
		throw new Error(`Unsupported type: ${type}`);
		}
		return new Tensor(type, ArrayConstructor.from(this.data), this.dims);
		}
		}
		@@ -489,0 +601,0 @@

16

types/env.d.ts

		export namespace env {
		export namespace backends {
		export { onnx_env as onnx };
		export const tfjs: {};
		export let tfjs: {};
		}
		export { __dirname };
		export { VERSION as version };
		export const allowRemoteModels: boolean;
		export const remoteHost: string;
		export const remotePathTemplate: string;
		export const allowLocalModels: boolean;
		export let allowRemoteModels: boolean;
		export let remoteHost: string;
		export let remotePathTemplate: string;
		export let allowLocalModels: boolean;
		export { localModelPath };
		@@ -17,8 +17,8 @@ export { FS_AVAILABLE as useFS };
		export { DEFAULT_CACHE_DIR as cacheDir };
		export const useCustomCache: boolean;
		export const customCache: any;
		export let useCustomCache: boolean;
		export let customCache: any;
		}
		declare const onnx_env: any;
		declare const __dirname: any;
		declare const VERSION: "2.7.0";
		declare const VERSION: "2.8.0";
		declare const localModelPath: any;
		@@ -25,0 +25,0 @@ declare const FS_AVAILABLE: boolean;

50

types/pipelines.d.ts

		@@ -240,7 +240,7 @@ /**
		* });
		* // [ 'To become more healthy, you can: 1. Eat a balanced diet with plenty of fruits, vegetables, whole grains, lean proteins, and healthy fats. 2. Stay hydrated by drinking plenty of water. 3. Get enough sleep and manage stress levels. 4. Avoid smoking and excessive alcohol consumption. 5. Regularly exercise and maintain a healthy weight. 6. Practice good hygiene and sanitation. 7. Seek medical attention if you experience any health issues.' ]
		* // [{ generated_text: "To become more healthy, you can: 1. Eat a balanced diet with plenty of fruits, vegetables, whole grains, lean proteins, and healthy fats. 2. Stay hydrated by drinking plenty of water. 3. Get enough sleep and manage stress levels. 4. Avoid smoking and excessive alcohol consumption. 5. Regularly exercise and maintain a healthy weight. 6. Practice good hygiene and sanitation. 7. Seek medical attention if you experience any health issues." }]
		* ```
		*/
		export class Text2TextGenerationPipeline extends Pipeline {
		_key: any;
		_key: string;
		/**
		@@ -280,3 +280,2 @@ * Fill the masked token in the text(s) given as inputs.
		export class SummarizationPipeline extends Text2TextGenerationPipeline {
		_key: string;
		}
		@@ -329,3 +328,2 @@ /**
		export class TranslationPipeline extends Text2TextGenerationPipeline {
		_key: string;
		}
		@@ -340,4 +338,4 @@ /**
		* let text = 'I enjoy walking with my cute dog,';
		* let classifier = await pipeline('text-generation', 'Xenova/distilgpt2');
		* let output = await classifier(text);
		* let generator = await pipeline('text-generation', 'Xenova/distilgpt2');
		* let output = await generator(text);
		* // [{ generated_text: "I enjoy walking with my cute dog, and I love to play with the other dogs." }]
		@@ -349,4 +347,4 @@ * ```
		* let text = 'Once upon a time, there was';
		* let classifier = await pipeline('text-generation', 'Xenova/distilgpt2');
		* let output = await classifier(text, {
		* let generator = await pipeline('text-generation', 'Xenova/distilgpt2');
		* let output = await generator(text, {
		* temperature: 2,
		@@ -369,4 +367,4 @@ * max_new_tokens: 10,
		* let text = 'def fib(n):';
		* let classifier = await pipeline('text-generation', 'Xenova/codegen-350M-mono');
		* let output = await classifier(text, {
		* let generator = await pipeline('text-generation', 'Xenova/codegen-350M-mono');
		* let output = await generator(text, {
		* max_new_tokens: 44,
		@@ -691,2 +689,10 @@ * });
		* ```
		*
		* Example: Optical Character Recognition (OCR) w/ `Xenova/trocr-small-handwritten`.
		* ```javascript
		* let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/handwriting.jpg';
		* let captioner = await pipeline('image-to-text', 'Xenova/trocr-small-handwritten');
		* let output = await captioner(url);
		* // [{ generated_text: 'Mr. Brown commented icily.' }]
		* ```
		*/
		@@ -987,2 +993,26 @@ export class ImageToTextPipeline extends Pipeline {
		}
		/**
		* Image to Image pipeline using any `AutoModelForImageToImage`. This pipeline generates an image based on a previous image input.
		*
		* Example: Super-resolution w/ `Xenova/swin2SR-classical-sr-x2-64`
		* ```javascript
		* let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/butterfly.jpg';
		* let upscaler = await pipeline('image-to-image', 'Xenova/swin2SR-classical-sr-x2-64');
		* let output = await upscaler(url);
		* // RawImage {
		* // data: Uint8Array(786432) [ 41, 31, 24, 43, ... ],
		* // width: 512,
		* // height: 512,
		* // channels: 3
		* // }
		* ```
		*/
		export class ImageToImagePipeline extends Pipeline {
		/**
		* Transform the image(s) passed as inputs.
		* @param {any} images The images to transform.
		* @returns {Promise<any>} An image or a list of images containing result(s).
		*/
		_call(images: any): Promise<any>;
		}
		export type QuestionAnsweringResult = {
		@@ -989,0 +1019,0 @@ /**

31

types/processors.d.ts

		@@ -86,2 +86,21 @@ declare const FeatureExtractor_base: new () => {
		/**
		* Pad the image by a certain amount.
		* @param {Float32Array} pixelData The pixel data to pad.
		* @param {number[]} imgDims The dimensions of the image.
		* @param {{width:number; height:number}\|number} padSize The dimensions of the padded image.
		* @param {Object} options The options for padding.
		* @param {'constant'\|'symmetric'} [options.mode='constant'] The type of padding to add.
		* @param {boolean} [options.center=false] Whether to center the image.
		* @param {number} [options.constant_values=0] The constant value to use for padding.
		* @returns {[Float32Array, number[]]} The padded pixel data and image dimensions.
		*/
		pad_image(pixelData: Float32Array, imgDims: number[], padSize: {
		width: number;
		height: number;
		} \| number, { mode, center, constant_values, }?: {
		mode?: 'constant' \| 'symmetric';
		center?: boolean;
		constant_values?: number;
		}): [Float32Array, number[]];
		/**
		* @typedef {object} PreprocessedImage
		@@ -133,2 +152,3 @@ * @property {HeightWidth} original_size The original size of the image.
		export class DonutFeatureExtractor extends ImageFeatureExtractor {
		pad_image(pixelData: any, imgDims: any, padSize: any, options?: {}): [Float32Array, number[]];
		}
		@@ -270,12 +290,8 @@ /**
		}
		export class Swin2SRImageProcessor extends ImageFeatureExtractor {
		pad_image(pixelData: any, imgDims: any, padSize: any, options?: {}): [Float32Array, number[]];
		}
		export class WhisperFeatureExtractor extends FeatureExtractor {
		constructor(config: any);
		/**
		* Calculates the index offset for a given index and window size.
		* @param {number} i The index.
		* @param {number} w The window size.
		* @returns {number} The index offset.
		*/
		calcOffset(i: number, w: number): number;
		/**
		* Pads an array with a reflected version of itself on both ends.
		@@ -454,2 +470,3 @@ * @param {Float32Array} array The array to pad.
		SamImageProcessor: typeof SamImageProcessor;
		Swin2SRImageProcessor: typeof Swin2SRImageProcessor;
		Wav2Vec2FeatureExtractor: typeof Wav2Vec2FeatureExtractor;
		@@ -456,0 +473,0 @@ SpeechT5FeatureExtractor: typeof SpeechT5FeatureExtractor;

9

types/utils/core.d.ts

		@@ -12,3 +12,3 @@ /**
		*
		* @param {function} progress_callback The progress callback function to dispatch.
		* @param {Function} progress_callback The progress callback function to dispatch.
		* @param {any} data The data to pass to the progress callback function.
		@@ -91,2 +91,9 @@ * @returns {void}
		/**
		* Calculates the index offset for a given index and window size.
		* @param {number} i The index.
		* @param {number} w The window size.
		* @returns {number} The index offset.
		*/
		export function calculateReflectOffset(i: number, w: number): number;
		/**
		* A base class for creating callable objects.
		@@ -93,0 +100,0 @@ *

14

types/utils/image.d.ts

		@@ -32,2 +32,7 @@ export class RawImage {
		/**
		* Helper method to create a new Image from a tensor
		* @param {import('./tensor.js').Tensor} tensor
		*/
		static fromTensor(tensor: import('./tensor.js').Tensor, channel_format?: string): RawImage;
		/**
		* Create a new `RawImage` object.
		@@ -40,11 +45,2 @@ * @param {Uint8ClampedArray} data The pixel data.
		constructor(data: Uint8ClampedArray, width: number, height: number, channels: 1 \| 2 \| 3 \| 4);
		/**
		* Mapping from file extensions to MIME types.
		*/
		_CONTENT_TYPE_MAP: {
		png: string;
		jpg: string;
		jpeg: string;
		gif: string;
		};
		data: Uint8ClampedArray;
		@@ -51,0 +47,0 @@ width: number;

58

types/utils/tensor.d.ts

		@@ -122,2 +122,26 @@ /**
		sigmoid_(): Tensor;
		/**
		* Return a new Tensor with every element multiplied by a constant.
		* @param {number} val The value to multiply by.
		* @returns {Tensor} The new tensor.
		*/
		mul(val: number): Tensor;
		/**
		* Multiply the tensor by a constant in place.
		* @param {number} val The value to multiply by.
		* @returns {Tensor} Returns `this`.
		*/
		mul_(val: number): Tensor;
		/**
		* Return a new Tensor with every element added by a constant.
		* @param {number} val The value to add by.
		* @returns {Tensor} The new tensor.
		*/
		add(val: number): Tensor;
		/**
		* Add the tensor by a constant in place.
		* @param {number} val The value to add by.
		* @returns {Tensor} Returns `this`.
		*/
		add_(val: number): Tensor;
		clone(): Tensor;
		@@ -181,3 +205,3 @@ slice(...slices: any[]): Tensor;
		*/
		squeeze_(dim?: any): Tensor;
		squeeze_(dim?: any): this;
		dims: any;
		@@ -196,7 +220,7 @@ /**
		*/
		unsqueeze_(dim?: any): Tensor;
		unsqueeze_(dim?: any): this;
		/**
		* In-place version of @see {@link Tensor.flatten}
		*/
		flatten_(start_dim?: number, end_dim?: number): Tensor;
		flatten_(start_dim?: number, end_dim?: number): this;
		/**
		@@ -217,5 +241,31 @@ * Flattens input by reshaping it into a one-dimensional tensor.
		view(...dims: number[]): Tensor;
		neg_(): Tensor;
		neg_(): this;
		neg(): Tensor;
		/**
		* In-place version of @see {@link Tensor.clamp}
		*/
		clamp_(min: any, max: any): this;
		/**
		* Clamps all elements in input into the range [ min, max ]
		* @param {number} min lower-bound of the range to be clamped to
		* @param {number} max upper-bound of the range to be clamped to
		* @returns the output tensor.
		*/
		clamp(min: number, max: number): Tensor;
		/**
		* In-place version of @see {@link Tensor.round}
		*/
		round_(): this;
		/**
		* Rounds elements of input to the nearest integer.
		* @returns the output tensor.
		*/
		round(): Tensor;
		/**
		* Performs Tensor dtype conversion.
		* @param {'bool'\|'float32'\|'float64'\|'string'\|'int8'\|'uint8'\|'int16'\|'uint16'\|'int32'\|'uint32'\|'int64'} type
		* @returns {Tensor} The converted tensor.
		*/
		to(type: 'bool' \| 'float32' \| 'float64' \| 'string' \| 'int8' \| 'uint8' \| 'int16' \| 'uint16' \| 'int32' \| 'uint32' \| 'int64'): Tensor;
		/**
		* Returns an iterator object for iterating over the tensor data in row-major order.
		@@ -222,0 +272,0 @@ * If the tensor has more than one dimension, the iterator will yield subarrays.

dist/transformers.js

Sorry, the diff of this file is too big to display

dist/transformers.js.map

Sorry, the diff of this file is not supported yet

dist/transformers.min.js

Sorry, the diff of this file is too big to display

dist/transformers.min.js.map

Sorry, the diff of this file is not supported yet

src/models.js

Sorry, the diff of this file is too big to display

src/pipelines.js

Sorry, the diff of this file is too big to display

types/models.d.ts

Sorry, the diff of this file is too big to display

types/models.d.ts.map

Sorry, the diff of this file is not supported yet

types/pipelines.d.ts.map

Sorry, the diff of this file is not supported yet

types/processors.d.ts.map

Sorry, the diff of this file is not supported yet

types/utils/core.d.ts.map

Sorry, the diff of this file is not supported yet

types/utils/hub.d.ts.map

Sorry, the diff of this file is not supported yet

types/utils/image.d.ts.map

Sorry, the diff of this file is not supported yet

types/utils/tensor.d.ts.map

Sorry, the diff of this file is not supported yet

		@@ -101,3 +101,3 @@
		<script type="module">
		import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.7.0';
		import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.8.0';
		</script>
		@@ -132,3 +132,3 @@ ```

		By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/transformers@2.7.0/dist/), which should work out-of-the-box. You can customize this as follows:
		By default, Transformers.js uses [hosted pretrained models](https://huggingface.co/models?library=transformers.js) and [precompiled WASM binaries](https://cdn.jsdelivr.net/npm/@xenova/transformers@2.8.0/dist/), which should work out-of-the-box. You can customize this as follows:

		@@ -215,3 +215,3 @@
		\| [Image Segmentation](https://huggingface.co/tasks/image-segmentation) \| `image-segmentation` \| Divides an image into segments where each pixel is mapped to an object. This task has multiple variants such as instance segmentation, panoptic segmentation and semantic segmentation. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.ImageSegmentationPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=image-segmentation&library=transformers.js) \|
		\| [Image-to-Image](https://huggingface.co/tasks/image-to-image) \| `image-to-image` \| Transforming a source image to match the characteristics of a target image or a target image domain. \| ❌ \|
		\| [Image-to-Image](https://huggingface.co/tasks/image-to-image) \| `image-to-image` \| Transforming a source image to match the characteristics of a target image or a target image domain. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.ImageToImagePipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=image-to-image&library=transformers.js) \|
		\| [Mask Generation](https://huggingface.co/tasks/mask-generation) \| `mask-generation` \| Generate masks for the objects in an image. \| ❌ \|
		@@ -229,3 +229,3 @@ \| [Object Detection](https://huggingface.co/tasks/object-detection) \| `object-detection` \| Identify objects of certain defined classes within an image. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.ObjectDetectionPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=object-detection&library=transformers.js) \|
		\| [Automatic Speech Recognition](https://huggingface.co/tasks/automatic-speech-recognition) \| `automatic-speech-recognition` \| Transcribing a given audio into text. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.AutomaticSpeechRecognitionPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=transformers.js) \|
		\| [Text-to-Speech](https://huggingface.co/tasks/text-to-speech) \| `text-to-speech` or `text-to-audio` \| \| Generating natural-sounding speech given text input. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.TextToAudioPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=text-to-audio&library=transformers.js) \|
		\| [Text-to-Speech](https://huggingface.co/tasks/text-to-speech) \| `text-to-speech` or `text-to-audio` \| Generating natural-sounding speech given text input. \| ✅ [(docs)](https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.TextToAudioPipeline)<br>[(models)](https://huggingface.co/models?pipeline_tag=text-to-audio&library=transformers.js) \|

		@@ -280,2 +280,3 @@
		1. [Donut](https://huggingface.co/docs/transformers/model_doc/donut) (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
		1. [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon) (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.
		1. [FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) (from Google AI) released in the repository [google-research/t5x](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints) by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
		@@ -295,2 +296,3 @@ 1. [GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo) (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
		1. [mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart) (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
		1. [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral) (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
		1. [MMS](https://huggingface.co/docs/transformers/model_doc/mms) (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.
		@@ -309,4 +311,6 @@ 1. [MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert) (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
		1. [Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin) (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
		1. [Swin2SR](https://huggingface.co/docs/transformers/model_doc/swin2sr) (from University of Würzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https://arxiv.org/abs/2209.11345) by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte.
		1. [T5](https://huggingface.co/docs/transformers/model_doc/t5) (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
		1. [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
		1. [TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr) (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
		1. [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit) (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
		@@ -313,0 +317,0 @@ 1. [Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2) (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.

@xenova/transformers - npm Package Compare versions

New alerts

Fixed alerts

Improved metrics