
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
mongo-bigquery-sync
Advanced tools
This package allows you to stream data from a MongoDB collection to a Google BigQuery table in batches. It is designed to handle large datasets by processing them in chunks, transforming the data as needed before inserting it into BigQuery, and providing progress logs during the operation.
Before using this package, ensure you have:
To install the package, use npm or yarn:
npm install mongodb-to-bigquery-streamer
or
yarn add mongodb-to-bigquery-streamer
Here's an example of how to use the package to stream data from MongoDB to BigQuery:
const { streamDataInBatchesFromMongoDB } = require('mongodb-to-bigquery-streamer');
const options = {
mongodbUri: 'mongodb://localhost:27017', // MongoDB URI
mongoDbName: 'my_database', // MongoDB Database Name
mongoCollectionName: 'my_collection', // MongoDB Collection Name
bigQueryDataset: 'my_dataset', // BigQuery Dataset Name
bigQueryTable: 'my_table', // BigQuery Table Name
transformFn: (doc) => doc, // Optional transformation function (default: identity function)
chunkSize: 100, // Optional batch size (default: 10)
mongoQueryParams: { active: true }, // Optional query parameters for MongoDB (default: {})
sortField: '_id', // Optional sorting field for MongoDB (default: '_id')
gbqProjectId: 'my-gbq-project', // Google BigQuery Project ID
gbqKeyFile: './path/to/keyfile.json' // Path to BigQuery service account key
};
streamDataInBatchesFromMongoDB(options)
.then((result) => {
if (result.success) {
console.log('Data successfully streamed to BigQuery.');
console.log('Last Inserted Record:', result.lastInsertedRecord);
} else {
console.error('Data streaming failed.');
}
})
.catch((error) => {
console.error('Error:', error);
});
| Parameter | Type | Description | Default Value |
|---|---|---|---|
mongodbUri | String | MongoDB connection URI. | |
mongoDbName | String | Name of the MongoDB database. | |
mongoCollectionName | String | Name of the MongoDB collection. | |
bigQueryDataset | String | BigQuery dataset name. | |
bigQueryTable | String | BigQuery table name. | |
transformFn | Function | (Optional) Function to transform documents before inserting into BigQuery. | (doc) => doc |
chunkSize | Number | (Optional) The number of documents to process in each batch. | 10 |
mongoQueryParams | Object | (Optional) Query parameters to filter documents from MongoDB. | {} |
sortField | String | (Optional) Field to sort the documents by (for paging through the data). | _id |
gbqProjectId | String | Google BigQuery project ID. | |
gbqKeyFile | String | Path to the Google BigQuery service account key file. |
You can define a custom transformation function to modify the MongoDB documents before they are inserted into BigQuery. This is useful for changing field names, formatting, or adding new fields. For example:
const transformFn = (doc) => {
return {
id: doc._id.toString(),
name: doc.name.toUpperCase(),
createdAt: doc.created_at,
};
};
In this example, the function:
_id to a string.name field.created_at field unchanged.The streamDataInBatchesFromMongoDB function returns an object indicating the result of the sync process:
{
"success": true,
"lastInsertedRecord": { ... }
}
If the streaming process succeeds, success will be true, and lastInsertedRecord will contain the last document that was inserted into BigQuery.
If an error occurs, success will be false, and you can inspect the logs to understand what went wrong.
To efficiently stream large datasets, the process is divided into batches. You can adjust the chunkSize parameter to control how many documents are processed in each batch. The default batch size is 10, but you can set it to a higher number for faster processing if needed.
chunkSize: 500, // Process 500 documents in each batch
Before using the package, make sure your Google Cloud service account has sufficient permissions to write to the specified BigQuery dataset and table.
BigQuery Data Editor).gbqKeyFile option in the configuration to provide the path to the service account key.gbqKeyFile: './path/to/keyfile.json'
This package is licensed under the MIT License.
FAQs
Basic setup to sync mongo data to GBQ
We found that mongo-bigquery-sync demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.