
Product
Rust Support Now in Beta
Socket's Rust support is moving to Beta: all users can scan Cargo projects and generate SBOMs, including Cargo.toml-only crates, with Rust-aware supply chain checks.
@darknoon/tiktoken
Advanced tools
tiktoken is a BPE tokeniser for use with OpenAI's models, forked from the original tiktoken library to provide NPM bindings for Node and other JS runtimes.
The open source version of tiktoken
can be installed from NPM:
npm install @dqbd/tiktoken
Basic usage follows:
import assert from "node:assert";
import { get_encoding, encoding_for_model } from "@dqbd/tiktoken";
const enc = get_encoding("gpt2");
assert(
new TextDecoder().decode(enc.decode(enc.encode("hello world"))) ===
"hello world"
);
// To get the tokeniser corresponding to a specific model in the OpenAI API:
const enc = encoding_for_model("text-davinci-003");
// Extend existing encoding with custom special tokens
const enc = encoding_for_model("gpt2", {
"<|im_start|>": 100264,
"<|im_end|>": 100265,
});
If desired, you can create a Tiktoken instance directly with custom ranks, special tokens and regex pattern:
import { Tiktoken } from "../pkg";
import { readFileSync } from "fs";
const encoder = new Tiktoken(
readFileSync("./ranks/gpt2.tiktoken").toString("utf-8"),
{ "<|endoftext|>": 50256, "<|im_start|>": 100264, "<|im_end|>": 100265 },
"'s|'t|'re|'ve|'m|'ll|'d| ?\\p{L}+| ?\\p{N}+| ?[^\\s\\p{L}\\p{N}]+|\\s+(?!\\S)|\\s+"
);
As this is a WASM library, there might be some issues with specific runtimes. If you encounter any issues, please open an issue.
Runtime | Status | Notes |
---|---|---|
Node.js | ✅ | |
Bun | ✅ | |
Vite | ✅ | See here for notes |
Next.js | ✅ 🚧 | See here for caveats |
Vercel Edge Runtime | 🚧 | Work in progress |
Cloudflare Workers | 🚧 | Untested |
Deno | ❌ | Currently unsupported |
If you are using Vite, you will need to add both the vite-plugin-wasm
and vite-plugin-top-level-await
. Add the following to your vite.config.js
:
import wasm from "vite-plugin-wasm";
import topLevelAwait from "vite-plugin-top-level-await";
import { defineConfig } from "vite";
export default defineConfig({
plugins: [wasm(), topLevelAwait()],
});
Both API routes and /pages
are supported with some caveats. To overcome issues with importing /node
variant and incorrect __dirname
resolution, you can import the package from @dqbd/tiktoken/bundler
instead.
import { get_encoding } from "@dqbd/tiktoken/bundler";
import { NextApiRequest, NextApiResponse } from "next";
export default function handler(req: NextApiRequest, res: NextApiResponse) {
return res.status(200).json({
// eslint-disable-next-line @typescript-eslint/no-unsafe-assignment
message: get_encoding("gpt2").encode(`Hello World ${Math.random()}`),
});
}
Additional Webpack configuration is also required, see https://github.com/vercel/next.js/issues/29362.
class WasmChunksFixPlugin {
apply(compiler) {
compiler.hooks.thisCompilation.tap("WasmChunksFixPlugin", (compilation) => {
compilation.hooks.processAssets.tap(
{ name: "WasmChunksFixPlugin" },
(assets) =>
Object.entries(assets).forEach(([pathname, source]) => {
if (!pathname.match(/\.wasm$/)) return;
compilation.deleteAsset(pathname);
const name = pathname.split("/")[1];
const info = compilation.assetsInfo.get(pathname);
compilation.emitAsset(name, source, info);
})
);
});
}
}
const config = {
webpack(config, { isServer, dev }) {
config.experiments = {
asyncWebAssembly: true,
layers: true,
};
if (!dev && isServer) {
config.output.webassemblyModuleFilename = "chunks/[id].wasm";
config.plugins.push(new WasmChunksFixPlugin());
}
return config;
},
};
To properly resolve tsconfig.json
, use either moduleResolution: "node16"
or moduleResolution: "nodenext"
:
{
"compilerOptions": {
"moduleResolution": "node16"
}
}
FAQs
Javascript bindings for tiktoken
The npm package @darknoon/tiktoken receives a total of 2 weekly downloads. As such, @darknoon/tiktoken popularity was classified as not popular.
We found that @darknoon/tiktoken demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket's Rust support is moving to Beta: all users can scan Cargo projects and generate SBOMs, including Cargo.toml-only crates, with Rust-aware supply chain checks.
Product
Socket Fix 2.0 brings targeted CVE remediation, smarter upgrade planning, and broader ecosystem support to help developers get to zero alerts.
Security News
Socket CEO Feross Aboukhadijeh joins Risky Business Weekly to unpack recent npm phishing attacks, their limited impact, and the risks if attackers get smarter.