🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more →

Demo Install Sign in

mongodb-collection-sample

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

mongodb-collection-sample

Sample documents from MongoDB collections.

5.0.0

latest

Source

npm

Version published: 2 years ago

Maintainers: 30

Created: 10 years ago

Source

mongodb-collection-sample

Sample documents from a MongoDB collection.

Install

npm install --save mongodb-collection-sample

Example

npm install mongodb lodash mongodb-collection-sample

const sample = require('mongodb-collection-sample');
const { MongoClient } = require('mongodb');
const _ = require('lodash');

const client = new MongoClient();

async function main() {
  await client.connect('mongodb://localhost:27017');

  // Generate 1000 documents
  const docs = _range(0, 1000).map(function(i) {
    return {
      _id: 'needle_' + i,
      is_even: i % 2
    };
  });

  // Insert them into a collection.
  await db.collection('haystack').insert(docs);

  const options = {};
  // Size of the sample to capture [default: `5`].
  options.size = 5;

  // Query to restrict sample source [default: `{}`]
  options.query = {};

  // Get a stream of sample documents from the collection.
  const stream = sample(db, 'haystack', options);
  stream.on('error', function(err){
    console.error('Error in sample stream', err);
    return process.exit(1);
  });
  stream.on('data', function(doc){
    console.log('Got sampled document `%j`', doc);
  });
  stream.on('end', function(){
    console.log('Sampling complete!  Goodbye!');
    db.close();
    process.exit(0);
  });
}

main();

Options

Supported options that can be passed to sample(db, coll, options) are

query: the filter to be used, default is {}
size: the number of documents to sample, default is 5
fields: the fields you want returned (projection object), default is null
raw: boolean to return documents as raw BSON buffers, default is false
sort: the sort field and direction, default is {_id: -1}
maxTimeMS: the maxTimeMS value after which the operation is terminated, default is undefined
promoteValues: boolean whether certain BSON values should be cast to native Javascript values or not. Default is true

How It Works

Native Sampler

MongoDB version 3.1.6 and above generally uses the $sample aggregation operator:

db.collectionName.aggregate([
  {$match: <query>},
  {$sample: {size: <size>}},
  {$project: <fields>},
  {$sort: <sort>}
])

However, if more documents are requested than are available, the $sample stage is omitted for performance optimization. If the sample size is above 5% of the result set count (but less than 100%), the algorithm falls back to the reservoir sampling, to avoid a blocking sort stage on the server.

Reservoir Sampling

For MongoDB version 3.1.5 and below we use a client-size reservoir sampling algorithm.

Query for a stream of _id values, limit 10,000.
Read stream of _ids and save sampleSize randomly chosen values.
Then query selected random documents by _id.

The two modes, illustrated:

Performance Notes

For peak performance of the client-side reservoir sampler, keep the following guidelines in mind.

The initial query for a stream of _id values must be limited to some finite value. (Default 10k)
This query should be covered by an index
Since there's a limit, you may wish to bias for recent documents via a sort. (Default: {_id: -1})
Don't sort on {$natural: -1}: this forces a collection scan!

Queries that include a sort by $natural order do not use indexes to fulfill the query predicate

When retrieving docs: batch using one $in to reduce network chattiness.

License

Apache 2

FAQs

What is mongodb-collection-sample?

Is mongodb-collection-sample well maintained?

Package last updated on 31 Dec 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

mongodb-collection-sample

mongodb-collection-sample

Install

Example

Options

How It Works

Native Sampler

Reservoir Sampling

Performance Notes

License

Related posts

Socket Now Supports pylock.toml Files

Destructive npm Packages Disguised as Utilities Enable Remote System Wipe

Malicious Ruby Gems Exfiltrate Telegram Tokens and Messages Following Vietnam Ban