Security News
Weekly Downloads Now Available in npm Package Search Results
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.
The MongoDB Connector for Hadoop is a plugin for Hadoop that provides the ability to use MongoDB as an input source and/or an output destination.
#MongoDB Connector for Hadoop
##Purpose
The MongoDB Connector for Hadoop is a library which allows MongoDB (or backup files in its data format, BSON) to be used as an input source, or output destination, for Hadoop MapReduce tasks. It is designed to allow greater flexibility and performance and make it easy to integrate data in MongoDB with other parts of the Hadoop ecosystem.
Current stable release: 1.2.0
mongorestore
See the release page.
The mongo-hadoop connector currently supports the following versions of hadoop: 0.23, 1.0, 1.1, 2.2, 2.3, 2.4,
and CDH 4 abd 5. The default build version will build against the last Apache Hadoop (currently 2.4). If you would like to build
against a specific version of Hadoop you simply need to pass -PclusterVersion=<your version>
to gradlew when building.
Run ./gradlew jar
to build the jars. The jars will be placed in to build/libs
for each module. e.g. for the core module,
it will be generated in the core/build/libs
directory.
After successfully building, you must copy the jars to the lib directory on each node in your hadoop cluster. This is usually one of the following locations, depending on which Hadoop release you are using:
$HADOOP_HOME/lib/
$HADOOP_HOME/share/hadoop/mapreduce/
$HADOOP_HOME/share/hadoop/lib/
Hadoop Version | Build Parameter |
---|---|
Apache Hadoop 0.23 | -PclusterVersion='0.23' |
Apache Hadoop 1.0 | -PclusterVersion='1.0' |
Apache Hadoop 1.1 | -PclusterVersion='1.1' |
Apache Hadoop 2.2 | -PclusterVersion='2.2' |
Apache Hadoop 2.3 | -PclusterVersion='2.3' |
Apache Hadoop 2.4 | -PclusterVersion='2.4' |
Cloudera Distribution for Hadoop 4 | -PclusterVersion='cdh4' |
Cloudera Distribution for Hadoop 5 | -PclusterVersion='cdh5' |
Amazon Elastic MapReduce is a managed Hadoop framework that allows you to submit jobs to a cluster of customizable size and configuration, without needing to deal with provisioning nodes and installing software.
Using EMR with the MongoDB Connector for Hadoop allows you to run MapReduce jobs against MongoDB backup files stored in S3.
Submitting jobs using the MongoDB Connector for Hadoop to EMR simply requires that the bootstrap actions fetch the dependencies (mongoDB
java driver, mongo-hadoop-core libs, etc.) and place them into the hadoop distributions lib
folders.
For a full example (running the enron example on Elastic MapReduce) please see here.
Documentation on Pig with the MongoDB Connector for Hadoop.
For examples on using Pig with the MongoDB Connector for Hadoop, also refer to the examples section.
If your code introduces new features, add tests that cover them if possible and make sure that ./gradlew check
still passes.
If you're not sure how to write a test for a feature or have trouble with a test failure, please post on the google-groups with details
and we will try to help. Note: Until findbugs updates its dependencies, running ./gradlew check
on Java 8 will fail.
Justin lee (justin.lee@mongodb.com)
Issue tracking: https://jira.mongodb.org/browse/HADOOP/
Discussion: http://groups.google.com/group/mongodb-user/
FAQs
The MongoDB Connector for Hadoop is a plugin for Hadoop that provides the ability to use MongoDB as an input source and/or an output destination.
We found that org.mongodb:flume demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.
Security News
A Stanford study reveals 9.5% of engineers contribute almost nothing, costing tech $90B annually, with remote work fueling the rise of "ghost engineers."
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.