
Product
Socket Now Supports pylock.toml Files
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
mongodb-collection-sample
Advanced tools
Sample documents from a MongoDB collection.
npm install --save mongodb-collection-sample
npm install mongodb lodash mongodb-collection-sample
const sample = require('mongodb-collection-sample');
const { MongoClient } = require('mongodb');
const _ = require('lodash');
const client = new MongoClient();
async function main() {
await client.connect('mongodb://localhost:27017');
// Generate 1000 documents
const docs = _range(0, 1000).map(function(i) {
return {
_id: 'needle_' + i,
is_even: i % 2
};
});
// Insert them into a collection.
await db.collection('haystack').insert(docs);
const options = {};
// Size of the sample to capture [default: `5`].
options.size = 5;
// Query to restrict sample source [default: `{}`]
options.query = {};
// Get a stream of sample documents from the collection.
const stream = sample(db, 'haystack', options);
stream.on('error', function(err){
console.error('Error in sample stream', err);
return process.exit(1);
});
stream.on('data', function(doc){
console.log('Got sampled document `%j`', doc);
});
stream.on('end', function(){
console.log('Sampling complete! Goodbye!');
db.close();
process.exit(0);
});
}
main();
Supported options that can be passed to sample(db, coll, options)
are
query
: the filter to be used, default is {}
size
: the number of documents to sample, default is 5
fields
: the fields you want returned (projection object), default is null
raw
: boolean to return documents as raw BSON buffers, default is false
sort
: the sort field and direction, default is {_id: -1}
maxTimeMS
: the maxTimeMS value after which the operation is terminated, default is undefined
promoteValues
: boolean whether certain BSON values should be cast to native Javascript values or not. Default is true
MongoDB version 3.1.6 and above generally uses the $sample
aggregation operator:
db.collectionName.aggregate([
{$match: <query>},
{$sample: {size: <size>}},
{$project: <fields>},
{$sort: <sort>}
])
However, if more documents are requested than are available, the $sample
stage
is omitted for performance optimization. If the sample size is above 5% of the
result set count (but less than 100%), the algorithm falls back to the reservoir
sampling, to avoid a blocking sort stage on the server.
For MongoDB version 3.1.5 and below we use a client-size reservoir sampling algorithm.
_id
s and save sampleSize
randomly chosen values.The two modes, illustrated:
For peak performance of the client-side reservoir sampler, keep the following guidelines in mind.
_id
values must be limited to some finite value. (Default 10k)Queries that include a sort by $natural order do not use indexes to fulfill the query predicate
Apache 2
FAQs
Sample documents from MongoDB collections.
We found that mongodb-collection-sample demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 30 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
Security News
Research
Socket uncovered two npm packages that register hidden HTTP endpoints to delete all files on command.
Research
Security News
Malicious Ruby gems typosquat Fastlane plugins to steal Telegram bot tokens, messages, and files, exploiting demand after Vietnam’s Telegram ban.