
utils
Local Inference internal utils package to avoid boilerplate across codebases.
Compatibility
- Runtimes: Node >= 22; Bun and Deno via npm compatibility; Browsers and browser web workers with WebAssembly; service-worker-style edge runtimes for package loading, GPU-detection fallback, and wasm-path selection smoke coverage.
- Module formats: ESM and CommonJS.
- Required globals / APIs:
Uint8Array; browser runtimes need ONNX Runtime Web compatible backends such as WebNN, WebGPU, WebGL, or WebAssembly.
- TypeScript: bundled types.
Goals
- Remove repeated setup code for SentencePiece and ONNX Runtime Web.
- Prefer the strongest available browser inference backend while still working in Node-based tooling.
- Keep the public surface intentionally small and side-effect free.
- Normalize failures into explicit
LocalInferenceUtilsError codes.
Installation
npm install @localinference/utils
pnpm add @localinference/utils
yarn add @localinference/utils
bun add @localinference/utils
deno add jsr:@localinference/utils
vlt install jsr:@localinference/utils
Usage
Check whether browser GPU acceleration is likely available
import { GPUAccelerationSupported } from '@localinference/utils'
if (GPUAccelerationSupported()) {
console.log('Browser GPU acceleration APIs look available')
}
Load a tokenizer
import { createTokenizer } from '@localinference/utils'
const tokenizer = await createTokenizer(modelBytes)
const ids = tokenizer.encodeIds('hello world')
const text = tokenizer.decodeIds(ids)
Create an inference session
import { Tensor } from 'onnxruntime-web'
import { createInferenceSession } from '@localinference/utils'
const session = await createInferenceSession(modelBytes)
const outputs = await session.run({
input: new Tensor('float32', Float32Array.from([1]), [1]),
})
Runtime behavior
Node / Bun / Deno
createInferenceSession() uses the onnxruntime-web runtime with the wasm execution provider. In Deno, it also forces single-threaded wasm to avoid runtime worker failures. createTokenizer() loads the serialized SentencePiece model directly from bytes.
Browsers / Web Workers
GPUAccelerationSupported() is a heuristic boolean probe for browser and browser-worker contexts. It checks for WebNN, WebGPU, and WebGL. createInferenceSession() only loads onnxruntime-web/all and prefers webnn, webgpu, webgl, then wasm when that probe returns true. Otherwise it stays on the wasm path. Browser builds must make the ONNX Runtime Web wasm assets reachable to the app.
Cloudflare Workers / Edge runtimes
This package is smoke-tested in a service-worker-style edge runtime through edge-runtime. That coverage verifies that the package loads, GPUAccelerationSupported() returns false, and createInferenceSession() selects the wasm-only path instead of browser GPU providers.
That is narrower than full cross-platform inference support. Real ONNX execution still depends on each edge platform's WebAssembly loading model and asset rules, so Cloudflare Workers, Vercel Edge, and similar runtimes should still be validated in the exact deployment target before being treated as a production inference environment.
Validation & errors
Failures are wrapped in LocalInferenceUtilsError with stable code values:
TOKENIZER_MODEL_LOAD_FAILED
INFERENCE_SESSION_CREATE_FAILED
The original dependency error is preserved as error.cause.
Caching semantics
This package does not cache tokenizers or inference sessions. Each call creates a fresh runtime object.
Tests
- Command:
npm test
- Coverage runner:
node test/run-coverage.mjs
- Coverage result:
100% statements, branches, functions, and lines
- Runtime e2e coverage:
- Node ESM
- Node CommonJS
- Bun ESM
- Bun CommonJS
- Deno ESM
- Vercel Edge Runtime ESM (
GPUAccelerationSupported() / wasm-path smoke)
- Chromium
- Firefox
- WebKit
- Mobile Chrome (
Pixel 5)
- Mobile Safari (
iPhone 12)
- CI matrix: Node
22.x and 24.x
npm test builds the package, runs the node:test unit/integration suite under c8, then runs the end-to-end smoke suite against the built package across the ESM/CJS/runtime matrix above.
Benchmarks
- Command:
npm run bench
- Environment:
Node v22.14.0 (win32 x64)
- Workload: serialized SentencePiece tokenizer load/encode plus ONNX Runtime session create/run against the included identity model fixture
- Results:
- tokenizer load:
7.9 ops/s (127.049 ms/op)
- tokenizer encode:
48509.3 ops/s (0.021 ms/op)
- session create:
10.6 ops/s (94.519 ms/op)
- session run:
8891.1 ops/s (0.112 ms/op)
Results vary by machine.
License
Apache-2.0