elasticsearch-loader

A pythonic tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

0.6.0

PyPI

Maintainers: 1

elasticsearch_loader |Build Status| |Can I Use Python 3?| |PyPI version|

Main features

Batch upload CSV (actually any *SV) files to Elasticsearch
Batch upload JSON files / JSON lines to Elasticsearch
Batch upload parquet files to Elasticsearch
Pre defining custom mappings
Delete index before upload
Index documents with _id from the document itself
Load data directly from url
SSL and basic auth
Unicode Support ✌️

Plugins

In order to install plugin, simply run pip install plugin-name - esl-redis <https://pypi.org/project/esl-redis>__ - Read continuously from a redis list(s) and index to elasticsearch - esl-s3 <https://pypi.org/project/esl-s3>__ - Plugin for listing and indexing files from S3

Test matrix


=========== ====== ===== =====
python / es 5.6.16 6.8.0 7.1.1
=========== ====== ===== =====
2.7         V      V     V
3.7         V      V     V
=========== ====== ===== =====

Installation

| pip install elasticsearch-loader | In order to add parquet support run pip install elasticsearch-loader[parquet]

Usage


::

   (venv)/tmp $ elasticsearch_loader --help
   Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...

   Options:
     -c, --config-file TEXT          Load default configuration file from esl.yml
     --bulk-size INTEGER             How many docs to collect before writing to
                                     Elasticsearch (default 500)
     --es-host TEXT                  Elasticsearch cluster entry point. (default
                                     http://localhost:9200)
     --verify-certs                  Make sure we verify SSL certificates
                                     (default false)
     --use-ssl                       Turn on SSL (default false)
     --ca-certs TEXT                 Provide a path to CA certs on disk
     --http-auth TEXT                Provide username and password for basic auth
                                     in the format of username:password
     --index TEXT                    Destination index name  [required]
     --delete                        Delete index before import? (default false)
     --update                        Merge and update existing doc instead of
                                     overwrite
     --progress                      Enable progress bar - NOTICE: in order to
                                     show progress the entire input should be
                                     collected and can consume more memory than
                                     without progress bar
     --type TEXT                     Docs type. TYPES WILL BE DEPRECATED IN APIS
                                     IN ELASTICSEARCH 7, AND COMPLETELY REMOVED
                                     IN 8.  [required]
     --id-field TEXT                 Specify field name that be used as document
                                     id
     --as-child                      Insert _parent, _routing field, the value is
                                     same as _id. Note: must specify --id-field
                                     explicitly
     --with-retry                    Retry if ES bulk insertion failed
     --index-settings-file FILENAME  Specify path to json file containing index
                                     mapping and settings, creates index if
                                     missing
     --timeout FLOAT                 Specify request timeout in seconds for
                                     Elasticsearch client
     --encoding TEXT                 Specify content encoding for input files
     --keys TEXT                     Comma separated keys to pick from each
                                     document
     -h, --help                      Show this message and exit.

   Commands:
     csv
     json     FILES with the format of [{"a": "1"}, {"b": "2"}]
     parquet
     redis
     s3

Examples

Load 2 CSV to elasticsearch ^^^^^^^^^^^^^^^^^^^^^^^^^^^

elasticsearch_loader --index incidents --type incident csv file1.csv file2.csv

Load JSONs to elasticsearch ^^^^^^^^^^^^^^^^^^^^^^^^^^^

elasticsearch_loader --index incidents --type incident json *.json

Load all git commits into elasticsearch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

git log --pretty=format:'{"sha":"%H","author_name":"%aN", "author_email": "%aE","date":"%ad","message":"%f"}' | elasticsearch_loader --type git --index git json --json-lines -

Load parquet to elasticsearch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

elasticsearch_loader --index incidents --type incident parquet file1.parquet

Load CSV from github repo (actually any http/https is ok) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

elasticsearch_loader --index data --type avg_height --id-field country json https://raw.githubusercontent.com/samayo/country-data/master/src/country-avg-male-height.json

Load data from stdin ^^^^^^^^^^^^^^^^^^^^

generate_data | elasticsearch_loader --index data --type incident csv -

Read id from incident_id field ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

elasticsearch_loader --id-field incident_id --index incidents --type incident csv file1.csv file2.csv

Load custom mappings ^^^^^^^^^^^^^^^^^^^^

elasticsearch_loader --index-settings-file samples/mappings.json --index incidents --type incident csv file1.csv file2.csv

Tests and sample data


End to end and regression tests are located under test directory and can
run by running ``./test.py`` Input formats can be found under samples

Stargazers over time
~~~~~~~~~~~~~~~~~~~~

|Stargazers over time|

.. |Build Status| image:: https://travis-ci.org/moshe/elasticsearch_loader.svg?branch=master
   :target: https://travis-ci.org/moshe/elasticsearch_loader
.. |Can I Use Python 3?| image:: https://caniusepython3.com/project/elasticsearch-loader.svg
   :target: https://caniusepython3.com/project/elasticsearch-loader
.. |PyPI version| image:: https://badge.fury.io/py/elasticsearch_loader.svg
   :target: https://pypi.python.org/pypi/elasticsearch-loader
.. |Stargazers over time| image:: https://starcharts.herokuapp.com/moshe/elasticsearch_loader.svg
   :target: https://starcharts.herokuapp.com/moshe/elasticsearch_loader

Keywords

elastic

FAQs

What is elasticsearch-loader?

Is elasticsearch-loader well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install