GGML-powered image classification addon for QVAC. Runs a fine-tuned MobileNetV3-Small 3-class triage CNN on the CPU backend of libggml and exposes a small, stable JavaScript API. Now intended for a specific image triage, but can be easily adapted for other classification tasks.

Property	Value
Model	MobileNetV3-Small (3 classes)
Parameters	~2.5 M
Weights	FP16 GGUF, 2.94 MB, bundled in this package
Input	JPEG, PNG, or raw RGB bytes
Resize target	224 × 224 (bilinear)
Normalization	ImageNet mean/std
Backend	`libggml` CPU (no GPU dependency)

Package name: @qvac/classification-ggml
Directory: packages/classification-ggml

Install

This addon is published to the @qvac scope and consumed like any other QVAC native addon. When used from the monorepo, npm install resolves @qvac/infer-base and @qvac/logging via the workspace.

Quickstart

const ImageClassifier = require('@qvac/classification-ggml')

const classifier = new ImageClassifier()
await classifier.load()

const imageBuffer = fs.readFileSync('./my-image.jpg')
const result = await classifier.classify(imageBuffer)
// [ { label: 'food',   confidence: 0.93 },
//   { label: 'other',  confidence: 0.05 },
//   { label: 'report', confidence: 0.02 } ]

await classifier.unload()

Raw RGB input

const result = await classifier.classify(rgbBuffer, {
  width: 320,
  height: 240,
  channels: 3,
})

topK filter

By default classify() returns one entry per class, sorted from most likely to least likely. Pass topK: N to keep only the top N results — for example topK: 1 returns just the single highest-scoring class:

const best = await classifier.classify(buf, { topK: 1 })

API

Method	Description
`new ImageClassifier(opts?)`	`opts = { modelPath?, logger?, nativeLogger? }`
`await load()`	Initialises the GGML backend and loads weights. Idempotent.
`await classify(buffer, options?)`	Runs inference. Returns `[{ label, confidence }, …]` sorted descending.
`await unload()`	Releases native resources. Safe to call again.
`await destroy()`	Releases resources and marks the instance as destroyed.
`getState()`	Returns `{ configLoaded, destroyed }`.

See index.d.ts for the full TypeScript surface.

Parameters

`new ImageClassifier(opts?)`

All constructor options are optional.

Option	Type	Default	Description
`modelPath`	`string`	Bundled `weights/mobilenetv3_3class_v3_fp16.gguf`	Absolute path to an FP16 GGUF file. Override only when pointing at a custom fine-tune produced by the ONNX→GGUF conversion guide. Also overridable via the `QVAC_CLASSIFICATION_MODEL_PATH` env variable.
`logger`	`QvacLogger`-shaped	`null`	A sink with optional `error / warn / info / debug(msg)` methods (compatible with `@qvac/logging`). Receives JS-side `info` from a successful `load()` and `error` from a failed `load()`. With `nativeLogger: true`, also receives forwarded native `LogMsg` events at `info` level. Always honoured, regardless of `nativeLogger`.
`nativeLogger`	`boolean`	`false`	When `true`, native C++ `QLOG(...)` lines from inside the addon's model-loading and graph code are forwarded to `logger`. Disabled by default because the underlying `qvac-lib-inference-addon-cpp` logger is a process-wide singleton with a static `uv_async_t` that is not safe across rapid create/destroy cycles (e.g. in tests).

`await classify(imageInput, options?)`

Parameter	Type	Default	Description
`imageInput` (required)	`Buffer`	`Uint8Array`	—
`options.topK`	`number`	`undefined` (all classes)	If set, the returned array is truncated to this many entries (top-K highest confidences). Must be a positive integer. Passing a value ≥ class count is a no-op.
`options.width`	`number`	—	Required for raw RGB input. Integer > 0. The underlying buffer must be exactly `width × height × channels` bytes; any mismatch throws a structured error.
`options.height`	`number`	—	Required for raw RGB input. Integer > 0.
`options.channels`	`3`	—	Required for raw RGB input. Must be exactly `3`. Grayscale and RGBA are not supported — decode or drop the alpha channel on the caller side.

Returns Promise<ClassificationResult[]> where each entry is { label: string; confidence: number }. The array is sorted by confidence descending, confidences are softmax probabilities in [0, 1] summing to ≈ 1, and label comes from the loaded GGUF's mobilenet.class_N metadata (so a future fine-tune can introduce new label strings without a code change).

`await load()` / `await unload()` / `await destroy()`

None take arguments. load() is idempotent — calling it twice is a no-op (check getState().configLoaded if you want to verify). unload() safely tears down the native handle and may be called multiple times. destroy() is equivalent to unload() plus a sticky destroyed flag in getState() — useful if your code wants to refuse reuse of a released instance.

Output contract

An array of { label: string, confidence: number }.
Sorted by confidence descending.
confidence values are softmax probabilities in [0, 1] and sum to ≈ 1.
Labels come from the GGUF metadata (mobilenet.class_0/1/2). For the bundled weights these are food, report, other.

Build (from source, monorepo)

Prerequisites: clang (LLVM ≥ 19) with matching libc++-dev, vcpkg, bare ≥ 1.24, bare-make. CI pins the exact LLVM major via the shared setup-llvm action; locally any recent clang works.

cd packages/classification-ggml
npm install
bare-make generate
bare-make build
bare-make install

One-liner: npm install && bare-make generate && bare-make build && bare-make install.

Testing

npm run test:integration     # brittle + bare JS integration tests (desktop)
npm run test:cpp             # GoogleTest C++ unit tests
npm run test:mobile:generate # regenerate test/mobile/integration.auto.cjs
npm run test:mobile:validate # verify mobile test file structure

Integration tests live in test/integration/*.test.js and use the 6 sample images under test/images/ (two images per class).

Mobile tests

Mobile tests use the shared qvac-test-addon-mobile framework. The test/mobile/integration.auto.cjs file is auto-generated by scripts/generate-mobile-integration-tests.js from every *.test.js under test/integration/, so adding a new integration test automatically exposes it on mobile too.

Before the mobile harness can be built, run

npm run mobile:copy-prebuilds

to populate test/mobile/testAssets/ (driven by scripts/copy-mobile-test-assets.js). The script (a) fans out the single arm64 prebuild into the per-flavour directories the framework expects under prebuilds/, (b) copies the FP16 GGUF weights with a .gguf.bin suffix so the React Native bundler treats them as a binary asset, and (c) copies every test/images/*.{jpg,jpeg,png} into testAssets/ so the integration tests can resolve them via global.assetPaths on-device. None of these copied files are checked into git. See test/mobile/README.md for the lifecycle note about the shared native logger.

Platform support

Platform	CPU	Notes
Linux x64	✅
Linux arm64	✅
macOS arm64 (Apple)	✅
macOS x64 (Intel)	✅
Windows x64	✅
Android arm64	✅	`c++_shared` STL
iOS arm64	✅

All platforms are produced by the shared reusable-prebuilds.yml matrix and merged into a single prebuilds artifact for downstream consumption. GPU (Vulkan / Metal / CUDA) is not currently supported.

Performance

Depending on the platform, one call to classifier.classify(buffer) takes from a few tens to a couple of hundred milliseconds.

What affects `classify()` latency

CPU thread pool — libggml sizes its internal CPU worker pool to std::thread::hardware_concurrency on every platform. The addon does not expose a tuning knob for this; if a future need arises, raise an issue and we can add one.
Input size — the JPEG/PNG decode and the stb_image_resize2 bilinear pass scale with source pixel count. The 224×224 tensor pass is fixed-cost; a 12 MP phone photo adds real overhead vs. a 640×480 webcam frame.
First-call overhead — load() already runs a full-pipeline warmup (synthetic-pattern pass through preprocess + GGML compute + output read) before returning, so the GGML compute buffers, weight buffer, and worker thread are fully materialised when the first classify() is dispatched. Even so, the first user-supplied call is typically a few tens of milliseconds slower than the steady-state average.
Re-use — load() once, classify() many times. Tearing down and rebuilding the model for each image is roughly 4–6× slower end-to-end and is never necessary outside of tests.

Memory footprint

Component	Size
Bundled FP16 weights (mmapped)	2.94 MB
Backend weight buffer (FP16 + folded BN + FP32 classifier)	≈ 5.5 MB
Intermediate activations (compute buffer)	single-digit MB
Total resident during inference	~8–10 MB

All GGML compute buffers (input tensor, intermediate activations, output) are allocated once at load() time and reused on every classify() call — ggml_backend_tensor_set / _get are the only operations that touch them per request. Per-call C++ allocations are bounded: one input-buffer copy across the bare-runtime boundary, the decoded RGB buffer, the resized 224×224 RGB buffer, the WHCN F32 tensor, and the 3-element softmax + result vectors. Multiple ImageClassifier instances each keep their own compute buffer and worker thread — you pay the ~8 MB once per instance.

Why FP16 weights?

FP16 was chosen because it matches FP32 top-1 accuracy on the internal validation set while halving the on-disk footprint (≈3 MB vs ≈6 MB) and giving a measurable inference speed-up on every CPU backend we ship. More aggressive quantizations (Q8_0, Q4_K and below) were evaluated on the same validation set and showed noticeable accuracy degradation, which for a 3-class triage model is not acceptable. If you fine-tune your own MobileNetV3-Small, keep FP16 as the publish format unless you re-run the full validation suite at the lower precision.

Measuring locally

The integration suite hooks the shared scripts/test-utils/performance-reporter.js via test/integration/utils.js. Running

npm run test:integration

writes test/results/performance-report.json with one total_time_ms entry per sample image, and in GitHub Actions also emits a Markdown step summary.

Architecture

See [docs/architecture.md](docs/architecture.md) for the MobileNetV3-Small layer breakdown and graph construction notes, and [docs/data-flow.md](docs/data-flow.md) for the end-to-end request flow.

Why a custom GGML graph?

llama-cpp doesn't support CNN architectures, so this addon bypasses llama.cpp entirely and talks to the stable ggml_* / ggml_backend_* public API.

For this MobileNetV3-Small the GGML CPU backend is, in most configurations, slower per call than the same network running on a mature PyTorch or ONNX Runtime build with their hand-tuned convolution kernels. Because the model is very small (≈2.5 M params, single-digit-millisecond compute on a modern phone), the absolute gap is negligible for a triage workload and is dominated by image decode and JS↔native marshalling. If a substantially larger classifier is ever added on top of this same scaffolding, expect to invest extra effort in graph-level optimisations (operator fusion, matmul tiling, FP16 SIMD kernels, threadpool sizing) before the GGML path is competitive.

Converting a new model

If you fine-tune or swap the underlying MobileNetV3 model, follow [docs/onnx-to-gguf-conversion.md](docs/onnx-to-gguf-conversion.md). The graph construction is parameterised by BLOCKS in MobileNetGraph.hpp — only classes and weights change between fine-tunes.

Troubleshooting

“MobileNet GGUF weights not found”: the default path is <package>/weights/mobilenetv3_3class_v3_fp16.gguf. Override with new ImageClassifier({ modelPath: '/abs/path.gguf' }) or set the QVAC_CLASSIFICATION_MODEL_PATH env variable.
All predictions look wrong: verify the BN epsilon is still 0.001 (see the guarded unit test) — the architecture is unusually sensitive to this constant.
Build fails looking for stb_image.h: make sure the stb vcpkg port is installed. The vcpkg-configuration.json pins it.
Mobile build fails looking for libggml-cpu: the prebuild workflow copies all ggml::${_backend} targets into prebuilds/. Re-run bare-make install.

License

Apache-2.0. See [LICENSE](LICENSE) and [NOTICE](NOTICE).

FAQs

What is @qvac/classification-ggml?

Is @qvac/classification-ggml popular?

Is @qvac/classification-ggml well maintained?

Package last updated on 24 Jun 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@qvac/classification-ggml