{
		"name": "@dqbd/tiktoken",
		"version": "0.2.0",
		"version": "0.2.1",
		"description": "Javascript bindings for tiktoken",
		"files": [
		"_tiktoken_bg.wasm",
		"_tiktoken.js",
		"_tiktoken.d.ts"
		"dist/*/",
		"package.json"
		],
		"main": "_tiktoken.js",
		"types": "_tiktoken.d.ts"
		}
		"license": "Apache-2.0",
		"main": "dist/node/_tiktoken.js",
		"browser": "dist/web/_tiktoken.js",
		"types": "dist/node/_tiktoken.d.ts",
		"repository": {
		"type": "git",
		"url": "https://github.com/dqbd/tiktoken"
		},
		"devDependencies": {},
		"scripts": {
		"build": "rm -rf dist/ && npm run build:node && npm run build:bundler && npm run build:web",
		"build:bundler": "wasm-pack build --target bundler --release --out-dir dist/bundler && rm dist/bundler/.gitignore",
		"build:node": "wasm-pack build --target nodejs --release --out-dir dist/node && rm dist/node/.gitignore",
		"build:web": "wasm-pack build --target no-modules --release --out-dir dist/web && rm dist/web/.gitignore"
		}
		}

108

README.md

		# ⏳ tiktoken

		tiktoken is a fast [BPE](https://en.wikipedia.org/wiki/Byte_pair_encoding) tokeniser for use with
		OpenAI's models.
		tiktoken is a [BPE](https://en.wikipedia.org/wiki/Byte_pair_encoding) tokeniser for use with
		OpenAI's models, forked from the original tiktoken library to provide NPM bindings for Node and other JS runtimes.

		```python
		import tiktoken
		enc = tiktoken.get_encoding("gpt2")
		assert enc.decode(enc.encode("hello world")) == "hello world"
		```typescript
		import assert from "node:assert";
		import { get_encoding, encoding_for_model } from "@dqbd/tiktoken";

		# To get the tokeniser corresponding to a specific model in the OpenAI API:
		enc = tiktoken.encoding_for_model("text-davinci-003")
		const enc = get_encoding("gpt2");
		assert(
		new TextDecoder().decode(enc.decode(enc.encode("hello world"))) ===
		"hello world"
		);

		// To get the tokeniser corresponding to a specific model in the OpenAI API:
		const enc = encoding_for_model("text-davinci-003");
		```

		The open source version of `tiktoken` can be installed from PyPI:
		```
		pip install tiktoken
		```

		The tokeniser API is documented in `tiktoken/core.py`.

		Example code using `tiktoken` can be found in the
		[OpenAI Cookbook](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb).


		## Performance

		`tiktoken` is between 3-6x faster than a comparable open source tokeniser:

		![image](./perf.svg)

		Performance measured on 1GB of text using the GPT-2 tokeniser, using `GPT2TokenizerFast` from
		`tokenizers==0.13.2` and `transformers==4.24.0`.


		## Getting help

		Please post questions in the [issue tracker](https://github.com/openai/tiktoken/issues).

		If you work at OpenAI, make sure to check the internal documentation or feel free to contact
		@shantanu.


		## Extending tiktoken

		You may wish to extend `tiktoken` to support new encodings. There are two ways to do this.


		Create your `Encoding` object exactly the way you want and simply pass it around.

		```python
		cl100k_base = tiktoken.get_encoding("cl100k_base")

		# In production, load the arguments directly instead of accessing private attributes
		# See openai_public.py for examples of arguments for specific encodings
		enc = tiktoken.Encoding(
		# If you're changing the set of special tokens, make sure to use a different name
		# It should be clear from the name what behaviour to expect.
		name="cl100k_im",
		pat_str=cl100k_base._pat_str,
		mergeable_ranks=cl100k_base._mergeable_ranks,
		special_tokens={
		**cl100k_base._special_tokens,
		"<\|im_start\|>": 100264,
		"<\|im_end\|>": 100265,
		}
		)
		```

		Use the `tiktoken_ext` plugin mechanism to register your `Encoding` objects with `tiktoken`.

		This is only useful if you need `tiktoken.get_encoding` to find your encoding, otherwise prefer
		option 1.

		To do this, you'll need to create a namespace package under `tiktoken_ext`.

		Layout your project like this, making sure to omit the `tiktoken_ext/__init__.py` file:
		npm install @dqbd/tiktoken
		```
		my_tiktoken_extension
		├── tiktoken_ext
		│ └── my_encodings.py
		└── setup.py
		```

		`my_encodings.py` should be a module that contains a variable named `ENCODING_CONSTRUCTORS`.
		This is a dictionary from an encoding name to a function that takes no arguments and returns
		arguments that can be passed to `tiktoken.Encoding` to construct that encoding. For an example, see
		`tiktoken_ext/openai_public.py`. For precise details, see `tiktoken/registry.py`.
		## Acknowledgements

		Your `setup.py` should look something like this:
		```python
		from setuptools import setup, find_namespace_packages

		setup(
		name="my_tiktoken_extension",
		packages=find_namespace_packages(include=['tiktoken_ext.*'])
		install_requires=["tiktoken"],
		...
		)
		```

		Then simply `pip install my_tiktoken_extension` and you should be able to use your custom encodings!
		Make sure not to use an editable install.

		- https://github.com/zurawiki/tiktoken-rs

_tiktoken_bg.wasm

_tiktoken.d.ts

_tiktoken.js

@dqbd/tiktoken - npm Package Compare versions

New alerts

Fixed alerts

Improved metrics

Worsened metrics