
Research
/Security News
Critical Vulnerability in NestJS Devtools: Localhost RCE via Sandbox Escape
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
Intelligent text batching library that uses k-means clustering to group sentences by length for optimal processing.
K-Batch is an intelligent text batching library that uses k-means clustering to group sentences by length for optimal processing. It's particularly useful for NLP tasks, machine learning batch processing, and any scenario where processing similar-length texts together improves efficiency.
When processing text in batches (especially for machine learning or NLP tasks), grouping sentences of similar length together can:
K-Batch uses k-means clustering to automatically group your sentences into optimal batches based on length, while ensuring each batch meets minimum size requirements.
npm install k-batch
import { kBatchSentences, analyzeKBatches } from 'k-batch';
const sentences = [
"This is a short sentence.",
"A significantly longer sentence that should be in a different batch.",
"Tiny.",
"Here is another medium-length sentence.",
"One more sentence to make it interesting.",
"And another one to round out the collection.",
"Make it interesting.",
"And another one to round out the collection.",
"wow, this is short.",
"Who?",
// ... more sentences
];
// Get optimally batched sentences
const batches = await kBatchSentences(sentences);
// Use your batches
batches.forEach((batch, index) => {
console.log(`Batch ${index + 1}: ${batch.length} sentences`);
console.log(batch);
// Process each batch...
});
// Get detailed statistics about your batches
const stats = await analyzeKBatches(batches);
console.log(stats);
The main function that batches sentences using k-means clustering.
sentences
(Array): Array of strings to be batchedoptions
(Object, optional): Configuration options
maxBatches
(Number): Maximum number of batches to create (default: 5)minSentencesPerBatch
(Number): Minimum sentences per batch (default: 4)minSentencesRequired
(Number): Minimum number of sentences required to perform splitting (default: 10)maxIterations
(Number): Maximum k-means iterations (default: 100)import { kBatchSentences } from 'k-batch';
const sentences = [/* your sentences */];
const batches = await kBatchSentences(sentences, {
maxBatches: 3,
minSentencesPerBatch: 5,
minSentencesRequired: 15,
maxIterations: 50
});
import { kBatchSentences, analyzeKBatches } from 'k-batch';
const sentences = [/* your sentences */];
const batches = await kBatchSentences(sentences);
// Get detailed statistics about your batches
const stats = await analyzeKBatches(batches);
console.log(stats);
/* Output:
[
{
count: 11,
longestLength: 39,
shortestLength: 5,
averageLength: 24.09,
standardDeviation: 9.87
},
// ... stats for other batches
]
*/
K-Batch uses a modified k-means clustering algorithm to group sentences by length:
The algorithm automatically determines the optimal number of clusters based on your data and constraints.
K-Batch includes a simple web interface to help you visualize and experiment with the batching algorithm. The Web UI allows you to:
To use the Web UI:
cd webui
npm install
npm start
This will start a local server and open the interface in your browser. For more details, see the Web UI README.
Contributions are welcome! Please feel free to submit a Pull Request.
git checkout -b feature/amazing-feature
)git commit -m 'Add some amazing feature'
)git push origin feature/amazing-feature
)This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ for the NLP and ML community
FAQs
Intelligent text batching library that uses k-means clustering to group sentences by length for optimal processing.
The npm package k-batch receives a total of 0 weekly downloads. As such, k-batch popularity was classified as not popular.
We found that k-batch demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
A flawed sandbox in @nestjs/devtools-integration lets attackers run code on your machine via CSRF, leading to full Remote Code Execution (RCE).
Product
Customize license detection with Socket’s new license overlays: gain control, reduce noise, and handle edge cases with precision.
Product
Socket now supports Rust and Cargo, offering package search for all users and experimental SBOM generation for enterprise projects.