Research
Security News
Quasar RAT Disguised as an npm Package for Detecting Vulnerabilities in Ethereum Smart Contracts
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
semantic-chunking
Advanced tools
semantically create chunks from large text (useful for passing to LLM workflows)
semantically create chunks from large text (useful for passing to LLM workflows)
npm install semantic-chunking
import { chunkit } from 'semantic-chunking';
let text = "some long text...";
let myChunks = chunkit(text);
chunkit(
text,
{ // options object
logging,
maxTokenSize,
similarityThreshold,
onnxEmbeddingModel,
onnxEmbeddingModelQuantized,
combineSimilarityChunks
}
)
text
full string to split into chunks
options object [optional]
logging
[optional | boolean | default false
]maxTokenSize
[optional | int | default 500
]similarityThreshold
[optional | float | default .567
]onnxEmbeddingModel
[optional | string | default Xenova/paraphrase-multilingual-MiniLM-L12-v2
]onnxEmbeddingModelQuantized
[optional | boolean | default true
]combineSimilarityChunks
[optional | boolean | default true
]text
is split into an array of sentences
vector
is created for each sentence
cosine similarity
score is created for each sentence pair
sentence
is added to a chunk until the similarity threshold
or max token size
for the chunk
is exceededsimilary chunks
are created combine similary chunks
into large chunks
up to the max token size
unless the combineSimilaryityChunks
was set to falseimport { chunkit } from 'semantic-chunking';
const text = await fs.promises.readFile('./example.txt', 'utf8');
let myChunks = await chunkit(text, { logging: true, similarityThreshold: .9 });
myChunks.forEach((chunk, index) => {
console.log("--------------------");
console.log("Chunk " + (index + 1));
console.log("--------------------");
console.log(chunk);
console.log("\n\n");
});
import { chunkit } from 'semantic-chunking';
let frogText = "A frog hops into a deli and croaks to the cashier, \"I'll have a sandwich, please.\" The cashier, surprised, quickly makes the sandwich and hands it over. The frog takes a big bite, looks around, and then asks, \"Do you have any flies to go with this?\" The cashier, taken aback, replies, \"Sorry, we're all out of flies today.\" The frog shrugs and continues munching on its sandwich, clearly unfazed by the lack of fly toppings. Just another day in the life of a sandwich-loving amphibian! 🐸🥪";
let myFrogChunks = await chunkit(frogText, { maxTokenSize: 65 });
console.log("myFrogChunks", myFrogChunks);
[1.0.0] - 2024-02-29
FAQs
Semantically create chunks from large texts. Useful for workflows involving large language models (LLMs).
The npm package semantic-chunking receives a total of 82 weekly downloads. As such, semantic-chunking popularity was classified as not popular.
We found that semantic-chunking demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
Security News
Research
A supply chain attack on Rspack's npm packages injected cryptomining malware, potentially impacting thousands of developers.
Research
Security News
Socket researchers discovered a malware campaign on npm delivering the Skuld infostealer via typosquatted packages, exposing sensitive data.