
Security News
Axios Supply Chain Attack Reaches OpenAI macOS Signing Pipeline, Forces Certificate Rotation
OpenAI rotated macOS signing certificates after a malicious Axios package reached its CI pipeline in a broader software supply chain attack.
sentence2simvecjs
Advanced tools
Vector-based sentence similarity (0.0-1.0) + embedding export. JavaScript implementation inspired by PINTO0309/sentence2simvec
Vector-based sentence similarity (0.0–1.0) + embedding export. JavaScript implementation inspired by PINTO0309/sentence2simvec.
https://github.com/user-attachments/assets/4738b015-ef68-4503-aa51-a467754d7081
npm install sentence2simvecjs
const {
diceCoefficient,
embeddingSimilarity,
runBenchmark,
initializeEmbeddingModel
} = require('sentence2simvecjs');
// Simple Dice's Coefficient
const diceScore = diceCoefficient("Hello world", "Hello there");
console.log(diceScore); // 0.5
// Embedding similarity (async)
async function example() {
// Initialize model once (optional, will auto-init on first use)
await initializeEmbeddingModel();
const result = await embeddingSimilarity("Hello world", "Hello there");
console.log(result.score); // 0.7234
console.log(result.executionTime); // 123.45 ms
}
// Run benchmark comparison
async function benchmark() {
const result = await runBenchmark("Hello world", "Hello there", {
ngramSize: 3,
preloadModel: true
});
console.log('Dice Score:', result.diceResult.score);
console.log('Embedding Score:', result.embeddingResult.score);
console.log('Speed ratio:', result.embeddingResult.executionTime / result.diceResult.executionTime);
}
const { EmbeddingCache, CorpusManager } = require('sentence2simvecjs');
// Create embedding cache
const cache = new EmbeddingCache({
persistToDisk: true,
cacheDir: './embeddings'
});
// Add texts to cache
await cache.addText('Machine learning is awesome');
await cache.addTextsFromFile('corpus.txt');
await cache.addTextsFromJSON('data.json', 'content');
// Find similar texts
const similar = await cache.findSimilar('Deep learning', 5);
// Batch similarity calculation
const scores = await cache.batchSimilarity('Neural networks');
const corpus = new CorpusManager({
enableDiceCache: true,
enableEmbeddingCache: true
});
// Load corpus
await corpus.loadFromFile('documents.txt');
await corpus.addItems([
{ text: 'First document', id: 'doc1' },
{ text: 'Second document', id: 'doc2' }
]);
// Search using both methods
const results = await corpus.search('query text', 'both', 10);
// Batch similarity for entire corpus
const allScores = await corpus.batchSimilarity('query text', 'embedding');
# Clone the repository
git clone https://github.com/your-username/sentence2simvecjs
cd sentence2simvecjs
# Install dependencies
npm install
# Build and run
npm start
diceCoefficient(text1: string, text2: string, ngramSize?: number): numberCalculate Dice's coefficient between two texts using n-grams.
text1, text2: Input texts to comparengramSize: Size of n-grams (default: 3)embeddingSimilarity(text1: string, text2: string): Promise<Result>Calculate semantic similarity using transformer embeddings.
score, embedding1, embedding2, and executionTimerunBenchmark(text1: string, text2: string, options?: Options): Promise<ComparisonResult>Run both similarity methods and compare performance.
options.ngramSize: N-gram size for Dice's coefficientoptions.preloadModel: Whether to preload the transformer modelEmbeddingCachePre-compute and cache embeddings for fast retrieval.
addText(text, id?, metadata?): Add single text to cacheaddTexts(texts): Add multiple textsaddTextsFromFile(filePath): Load texts from filefindSimilar(query, topK, threshold?): Find similar cached textsbatchSimilarity(query): Get all similarity scoresCorpusManagerManage large text collections with both Dice and embedding methods.
addItem(text, id?, metadata?): Add text to corpusloadFromFile(filePath, format): Load corpus from filesearch(query, method, topK): Search corpusbatchSimilarity(query, method): Calculate all similaritiesInitial model loading takes 1-3 seconds depending on hardware.
The new EmbeddingCacheV2 supports multiple storage backends:
// File storage (Node.js)
const fileCache = new EmbeddingCacheV2({
storageType: 'file',
cacheDir: './embeddings'
});
// LocalStorage (Browser)
const browserCache = new EmbeddingCacheV2({
storageType: 'localStorage',
storagePrefix: 'myapp_embeddings_',
maxItems: 1000 // Limit items to prevent quota issues
});
// Memory storage (default)
const memoryCache = new EmbeddingCacheV2({
storageType: 'memory'
});
// Custom storage adapter
const customCache = new EmbeddingCacheV2({
storageAdapter: myCustomAdapter // Implement StorageAdapter interface
});
<script type="module">
import { EmbeddingCacheV2, initializeEmbeddingModel } from 'sentence2simvecjs';
async function setupBrowserCache() {
await initializeEmbeddingModel();
const cache = new EmbeddingCacheV2({
storageType: 'localStorage',
storagePrefix: 'embeddings_',
maxItems: 500 // Prevent localStorage quota exceeded
});
// Add texts
await cache.addText('Example text');
// Find similar
const results = await cache.findSimilar('Query text', 5);
// Check storage usage
const info = await cache.getStorageInfo();
console.log(`Using ${info.estimatedSize / 1024}KB of localStorage`);
}
</script>
The original EmbeddingCache still works for backward compatibility:
// Original file-based cache
const cache = new EmbeddingCache({
persistToDisk: true,
cacheDir: '/path/to/my/cache'
});
The cache is stored as JSON with the following structure:
[
{
"id": "unique_id",
"text": "Original text",
"embedding": [0.123, -0.456, ...], // 384-dimensional array
"metadata": { /* optional metadata */ }
}
]
// Clear all cache (works with all storage types)
await cache.clear(); // Removes all cached embeddings
// Remove specific item
await cache.remove('specific_id');
// Export/Import (works with all storage types)
const jsonData = await cache.exportToJSON();
await cache.importFromJSON(jsonData);
// Check storage info
const info = await cache.getStorageInfo();
console.log(`Storage type: ${info.type}`);
console.log(`Items: ${info.itemCount}`);
console.log(`Size: ${info.estimatedSize} bytes`);
The clear() method removes all cached embeddings:
// LocalStorage example - only clears items with 'myapp_' prefix
const cache = new EmbeddingCacheV2({
storageType: 'localStorage',
storagePrefix: 'myapp_' // Only 'myapp_*' keys will be cleared
});
await cache.clear(); // Other localStorage data remains untouched
// Confirm deletion
const remaining = await cache.size();
console.log(`Items after clear: ${remaining}`); // Should be 0
Use maxItems option to prevent storage overflow:
const cache = new EmbeddingCacheV2({
storageType: 'localStorage',
maxItems: 500 // Automatically removes oldest items
});
When using @xenova/transformers in the browser, the model files are stored separately from your embedding cache:
transformers-cachetransformers-cache or similar// Clear transformer model cache
caches.keys().then(names => {
names.forEach(name => {
if (name.includes('transformers')) {
caches.delete(name);
}
});
});
// Clear embedding cache (your computed results)
await cache.clear();
To use in a browser environment:
npm run build:browser
npm run serve
# Or use any static file server
http://localhost:8000/src/browser/test-dice-only.htmlhttp://localhost:8000/src/browser/test-localstorage.htmlNote: The embedding model initialization may take 10-30 seconds on first load as it downloads the model files (~25MB) from Hugging Face. The Dice-only test page works immediately without any model download.
The test page provides an interactive interface to test the LocalStorage cache functionality:
Natural language processing enables computers to understand text
Deep learning models can learn complex patterns
Neural networks are inspired by the human brain
JavaScript is a programming language for web development
React is a library for building user interfaces
[
{
"id": "text_1077264583", // Unique identifier (auto-generated or custom)
"text": "こんにちは", // Original text
"embedding": [ // 384-dimensional vector from all-MiniLM-L6-v2
-0.10119643807411194,
// ... (382 more values)
-0.008699539117515087
],
"timestamp": 1753166234369 // Unix timestamp when cached
},
{
"id": "text_1712359701",
"text": "はじめまして",
"embedding": [
-0.031796280294656754,
// ... (382 more values)
-0.005393804516643286
],
"timestamp": 1753166261449
},
{
"id": "text_6942345",
"text": "今日はいい天気ですね。",
"embedding": [
0.03111492656171322,
// ... (382 more values)
-0.012813657522201538
],
"timestamp": 1753166295569
},
{
"id": "text_2137068100",
"text": "Hello.",
"embedding": [
-0.09045851230621338,
// ... (382 more values)
0.015684669837355614
],
"timestamp": 1753167371990
},
{
"id": "text_1654144361",
"text": "Hello. Good morning.",
"embedding": [
-0.025240488350391388,
// ... (382 more values)
0.00397441117092967
],
"timestamp": 1753167383761
}
]
id: Unique identifier for each cached text
text_[hash] (e.g., "text_1077264583")text: The original text string that was embeddedembedding: 384-dimensional Float32Array from all-MiniLM-L6-v2 model
timestamp: Unix timestamp (milliseconds since epoch)
findSimilar execution time<script src="path/to/sentence2simvecjs/dist/browser.js"></script>
<script>
const { EmbeddingCacheV2, initializeEmbeddingModel } = window.sentence2simvecjs;
async function init() {
await initializeEmbeddingModel();
const cache = new EmbeddingCacheV2({
storageType: 'localStorage'
});
// Use the cache...
}
</script>
Note: Direct file:// access will cause CORS errors. Always serve through HTTP/HTTPS.
This library includes high-performance visualization components using OffscreenCanvas and Web Workers for non-blocking rendering.
import { SimilarityVisualization } from 'sentence2simvecjs/renderer';
// In your React component
<SimilarityVisualization
data={benchmarkResults}
type="heatmap" // or "barchart" or "scatter"
width={600}
height={400}
title="Similarity Matrix"
/>
OffscreenCanvas is supported in:
The visualization component automatically falls back to main thread rendering for unsupported browsers.
To test OffscreenCanvas visualization:
npm run serve
# Navigate to http://localhost:8000/src/browser/test-offscreencanvas.html
The test page includes:
Using OffscreenCanvas provides:
Apache-2.0
Inspired by PINTO0309/sentence2simvec
FAQs
Vector-based sentence similarity (0.0-1.0) + embedding export. JavaScript implementation inspired by PINTO0309/sentence2simvec
We found that sentence2simvecjs demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
OpenAI rotated macOS signing certificates after a malicious Axios package reached its CI pipeline in a broader software supply chain attack.

Security News
Open source is under attack because of how much value it creates. It has been the foundation of every major software innovation for the last three decades. This is not the time to walk away from it.

Security News
Socket CEO Feross Aboukhadijeh breaks down how North Korea hijacked Axios and what it means for the future of software supply chain security.