wink-embeddings-small-en-50d

Small English 50-dimension word-embedding dataset compatible with wink-nlp.
Package size: ≤ 10 MB
Vocabulary: ≈ 5 k–10 k most-common English words (you can regenerate with any size you like).
Installation
npm install wink-embeddings-small-en-50d
Usage
import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
import embeddings from 'wink-embeddings-small-en-50d';
const nlp = winkNLP(model);
nlp.readDoc('hello world').tokens().each((t) => {
const word = t.out();
const vector = embeddings[word];
console.log(word, vector);
});
Each vector is an array of 50 floats and can be used with cosine similarity, etc.
API
import embeddings from 'wink-embeddings-small-en-50d'
Returns a plain object mapping strings → number[50].
interface Vector extends ReadonlyArray<number> { length: 50; }
interface Embeddings { [word: string]: Vector }
Regenerating / Updating the Dataset
A conversion script is provided to build your own subset from any GloVe 50-dimension file.
curl -L https://nlp.stanford.edu/data/glove.6B.zip -o glove.zip
unzip glove.zip glove.6B.50d.txt
npm run convert:glove -- ./glove.6B.50d.txt src/embeddings.json 10000
Commit the new embeddings.json, rebuild, and publish.
Development
npm install
npm test
npm run build
Testing
The test-suite validates that:
- All keys are strings.
- Every vector has length 50 and all elements are numbers.
npm test
Publishing
npm version patch
npm publish --access public
🔗 Related
© 2025 Cavani21/TheGreatBey – MIT License