mongodb-schema
Infer probabilistic schema of javascript objects or a MongoDB collection.
This package is dual-purpose. It serves as a node.js module and can also be used with MongoDB directly, where it extends the DBCollection
shell object.
mongodb-schema is an early prototype. Use at your own risk.
Usage with Node.js
Installation
Install the script with:
npm install mongodb-schema
Usage
Then load the module and use call schema( documents, options, callback )
, which will call callback(err, res)
with an error or the result once it's done analysing the documents.
var schema = require('mongodb-schema');
var documents = [
{a: 1},
{a: {b: "hello"}}
];
var options = {flat: true};
var callback = function(err, res) {
if (err) {
return console.error( err );
}
console.log( JSON.stringify( res, null, '\t' ) );
}
schema( documents, options, callback );
This would output:
{
"$c": 2,
"a": {
"$c": 2,
"$t": {
"number": 1,
"object": 1
},
"$p": 1
},
"a.b": {
"$c": 1,
"$t": "string",
"$p": 0.5
}
}
Usage with MongoDB
Installation
There are two ways to load the script, one-time (for testing) and permanent (for frequent use).
1. Load the script directly (one-time usage)
This will first load mongodb-schema.js
and the open the shell as usual. You will have to add the script every time you open the shell.
mongo <basepath>/lib/mongodb-schema.js --shell
Replace the <basepath>
part with the actual path where the mongodb-schema
folder is located.
2. Load the script via the .mongorc.js
file (permanent usage)
You can also add the following line to your ~/.mongorc.js
file to always load the file on shell startup (unless started with --norc
):
load('<basepath>/lib/mongodb-schema.js')
Replace the <basepath>
part with the actual path where the mongodb-schema
folder is located.
Usage
Basic Usage
The script extends the DBCollection
object to have another new method: .schema()
. On a collection called foo
, run it with:
db.foo.schema()
This will use the first 100 (by default) documents from the collection and calculate a probabilistic schema based on these documents.
Usage with options
You can pass in an options object into the .schema()
method. Currently it supports 2 options: samples
and flat
.
db.foo.schema( {samples: 20, flat: true} )
This will use the first 20 documents to calculate the schema and return the schema as flat object (all fields are collapsed to the top with dot-notation). See the Examples section below for nested vs. flat schemata.
Examples
The schema generated by this method annotates most of the fields in the object with schema information. These can be counts ($c
), type information ($t
), a flag whether or not the field was an array in at least one instance ($a
) and the probability of a field appearing given it's parent field ($p
).
Example of nested schema (default)
100 documents were passed to the schema function (in MongoDB-mode, this is the default if the samples
option is not specified).
{
"$c": 100,
"_id": {
"$c": 100,
"$p": 1,
"type": "ObjectId"
},
"a": {
"$c": 100,
"$p": 1,
"b": {
"$c": 100,
"$p": 1,
"$t": "number"
},
"c": {
"$c": 70,
"$p": 0.7,
"$t": "string"
}
}
}
Example of flat schema
20 documents were passed to the schema function, as well as the option {flat: true}
.
{
"$c": 20,
"_id": {
"$c": 20,
"$p": 1,
"type": "ObjectId"
},
"a.b": {
"$c": 20,
"$p": 1,
"$t": "number"
},
"a.c": {
"$c": 13,
"$p": 0.65,
"$t": "string"
}
}