
Security News
Browserslist-rs Gets Major Refactor, Cutting Binary Size by Over 1MB
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
.. image:: https://travis-ci.org/pyelasticsearch/pyelasticsearch.png :alt: Build Status :align: right :target: https://travis-ci.org/pyelasticsearch/pyelasticsearch
pyelasticsearch is a clean, future-proof, high-scale API to elasticsearch. It provides...
For more on our philosophy and history, see Comparison with elasticsearch-py, the “Official Client” <https://pyelasticsearch.readthedocs.org/en/latest/elasticsearch-py/>
_.
Make a pooling, balancing, all-singing, all-dancing connection object::
from pyelasticsearch import ElasticSearch es = ElasticSearch('http://localhost:9200/')
Index a document::
es.index('contacts', ... 'person', ... {'name': 'Joe Tester', 'age': 25, 'title': 'QA Master'}, ... id=1) {u'_type': u'person', u'_id': u'1', u'ok': True, u'_version': 1, u'_index': u'contacts'}
Index a couple more documents, this time in a single request using the bulk-indexing API::
docs = [{'id': 2, 'name': 'Jessica Coder', 'age': 32, 'title': 'Programmer'}, ... {'id': 3, 'name': 'Freddy Tester', 'age': 29, 'title': 'Office Assistant'}] es.bulk((es.index_op(doc, id=doc.pop('id')) for doc in docs), ... index='contacts', ... doc_type='person')
If we had many documents and wanted to chunk them for performance,
bulk_chunks() <https://pyelasticsearch.readthedocs.org/en/latest/api/#pyelasticsearch.bulk_chunks>
_ would easily rise to the task,
dividing either at a certain number of documents per batch or, for curated
platforms like Google App Engine, at a certain number of bytes. Thanks to
the decoupled design, you can even substitute your own batching function if
you have unusual needs. Bulk indexing is the most demanding ES task in most
applications, so we provide very thorough tools for representing operations,
optimizing wire traffic, and dealing with errors. See
bulk() <https://pyelasticsearch.readthedocs.org/en/latest/api/#pyelasticsearch.ElasticSearch.bulk>
_ for more.
Refresh the index to pick up the latest::
es.refresh('contacts') {u'ok': True, u'_shards': {u'successful': 5, u'failed': 0, u'total': 10}}
Get just Jessica's document::
es.get('contacts', 'person', 2) {u'_id': u'2', u'_index': u'contacts', u'_source': {u'age': 32, u'name': u'Jessica Coder', u'title': u'Programmer'}, u'_type': u'person', u'_version': 1, u'exists': True}
Perform a simple search::
es.search('name:joe OR name:freddy', index='contacts') {u'_shards': {u'failed': 0, u'successful': 42, u'total': 42}, u'hits': {u'hits': [{u'_id': u'1', u'_index': u'contacts', u'_score': 0.028130024999999999, u'_source': {u'age': 25, u'name': u'Joe Tester', u'title': u'QA Master'}, u'_type': u'person'}, {u'_id': u'3', u'_index': u'contacts', u'_score': 0.028130024999999999, u'_source': {u'age': 29, u'name': u'Freddy Tester', u'title': u'Office Assistant'}, u'_type': u'person'}], u'max_score': 0.028130024999999999, u'total': 2}, u'timed_out': False, u'took': 4}
Perform a search using the elasticsearch query DSL
_:
.. _elasticsearch query DSL
: http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
::
query = { ... 'query': { ... 'filtered': { ... 'query': { ... 'query_string': {'query': 'name:tester'} ... }, ... 'filter': { ... 'range': { ... 'age': { ... 'from': 27, ... 'to': 37, ... }, ... }, ... }, ... }, ... }, ... } es.search(query, index='contacts') {u'_shards': {u'failed': 0, u'successful': 42, u'total': 42}, u'hits': {u'hits': [{u'_id': u'3', u'_index': u'contacts', u'_score': 0.19178301, u'_source': {u'age': 29, u'name': u'Freddy Tester', u'title': u'Office Assistant'}, u'_type': u'person'}], u'max_score': 0.19178301, u'total': 1}, u'timed_out': False, u'took': 2}
Delete the index::
es.delete_index('contacts') {u'acknowledged': True, u'ok': True}
For more, see the full API Documentation <https://pyelasticsearch.readthedocs.org/en/latest/api/>
_.
ca_certs
arg to
the ElasticSearch
constructor.client_cert
arg.query_params
kwarg is omitted from calls to
send_request()
.delete_all_indexes()
work._all
as an index name sometimes caused
doctype names to be treated as index names.bulk()
docs.ConnectionTimeout
.create_index()
with no explicit settings
arg. This solves 411s when using nginx as a
proxy.doc_as_upsert()
arg to update()
.bulk_chunks()
compute perfectly optimal results, no longer ever
exceeding the byte limit unless a single document is over the limit on its own.bulk_chunks()
, and
introducing per-action error-handling. All errors raise exceptions--even
individual failed operations--and the exceptions expose enough data to
identify operations for retrying or reporting. The design is decoupled in
case you want to create your own chunkers or operation builders.bulk_index()
in favor of the more capable bulk()
.bulk_index()
. It now catches individual
operation failures, raising BulkError
. Also add the index_field
and
type_field
args, allowing you to index across different indices and doc
types within one request.ElasticSearch
object now defaults to http://localhost:9200/ if you don't provide any node URLs.delete_by_query()
to work with ES 1.0 and later.percolate()
es_kwargs up to date... note::
Backward incompatible:
cluster_state()
to work with ES 1.0 and later. Arguments have
changed.response
property): just the bad data (the input
property).revival_delay
param from ElasticSearch object.encode_body
param from send_request()
. Now all dicts are
JSON-encoded, and all strings are left alone.update_aliases()
API change.id_field
is specified for bulk_index()
, don't index it under
its original name as well; use it only as the _id
.aliases()
to get_aliases()
for consistency with other
methods. Original name still works but is deprecated. Add an alias
kwarg
to the method so you can fetch specific aliases... note::
Backward incompatible:
update_aliases()
no longer requires a dict with an actions
key;
that much is implied. Just pass the value of that key.IndexAlreadyExistsException
even if the error is reported
by a node other than the one to which the client is directly connected.
(Jannis Leidel).. note::
Note the change in behavior of bulk_index()
in this release. This change
probably brings it more in line with your expectations. But double check,
since it now overwrites existing docs in situations where it didn't before.
Also, we made a backward-incompatible spelling change to a little-used
index()
kwarg.
bulk_index()
now overwrites any existing doc of the same ID and doctype.
Before, in certain versions of ES (like 0.90RC2), it did nothing at all if a
document already existed, probably much to your surprise. (We removed the
'op_type': 'create'
pair, whose intentions were always mysterious.)
(Gavin Carothers)force_insert
kwarg of index()
to overwrite_existing
.
The old name implied the opposite of what it actually did. (Gavin Carothers)delete_by_query()
. Accept both
string and JSON queries in the query
arg, just as search()
does.
Passing the q
arg explicitly is now deprecated.multi_get
.percolate
. Thanks, Adam Georgiou and Joseph Rose!bulk_index()
. Thanks, Gavin
Carothers!from_python
method. django-haystack
users will need to upgrade to a newer version that avoids using it.ElasticSearch.json_encoder
.python -OO
.index()
.Support Python 3.
Support more APIs:
cluster_state
get_settings
update_aliases
and aliases
update
(existed but didn't work before)Support the size
param of the search
method. (You can now change
es_size
to size
in your code if you like.)
Support the fields
param on index
and update
methods, new since
ES 0.20.
Maintain better precision of floats when passed to ES.
Change endpoint of bulk indexing so it works on ES < 0.18.
Support documents whose ID is 0.
URL-escape path components, so doc IDs containing funny chars work.
Add a dedicated IndexAlreadyExistsError
exception for when you try to
create an index that already exists. This helps you trap this situation
unambiguously.
Add docs about upgrading from pyes.
Remove the undocumented and unused to_python
method.
requests
requirement to require a version that has everything
we need. In fact, require requests 1.x, which has a stable API.update()
method.send_request
method public so you can use ES APIs we don't yet
explicitly support.more_like_this()
take an arbitrary request body so you can filter
the returned docs.fields
arg of more_like_this
with mlt_fields
. This
makes it actually work, as it's the param name ES expects.Many thanks to Erik Rose for almost completely rewriting the API to follow best practices, improve the API user experience, and make pyelasticsearch future-proof.
.. note::
This release is backward-incompatible in numerous ways, please read the following section carefully. If in doubt, you can easily stick with pyelasticsearch 0.1.
Backward-incompatible changes:
Simplify search()
and count()
calling conventions. Each now supports
either a textual or a dict-based query as its first argument. There's no
longer a need to, for example, pass an empty string as the first arg in order
to use a JSON query (a common case).
Standardize on the singular for the names of the index
and doc_type
kwargs. It's not always obvious whether an ES API allows for multiple
indexes. This was leading me to have to look aside to the docs to determine
whether the kwarg was called index
or indexes
. Using the singular
everywhere will result in fewer doc lookups, especially for the common case
of a single index.
Rename morelikethis
to more_like_this
for consistency with other
methods.
index()
now takes (index, doc_type, doc)
rather than (doc, index, doc_type)
, for consistency with bulk_index()
and other methods.
Similarly, put_mapping()
now takes (index, doc_type, mapping)
rather than (doc_type, mapping, index)
.
To prevent callers from accidentally destroying large amounts of data...
delete()
no longer deletes all documents of a doctype when no ID is
specified; use delete_all()
instead.delete_index()
no longer deletes all indexes when none are given; use
delete_all_indexes()
instead.update_settings()
no longer updates the settings of all indexes when
none are specified; use update_all_settings()
instead.setup_logging()
is gone. If you want to configure logging, use the
logging module's usual facilities. We still log to the "pyelasticsearch"
named logger.
Rethink error handling:
NonJsonResponseError
instead of the generic ElasticSearchError
.except
clauses.ElasticSearchError
.ConnectionError
rather than ElasticSearchError
if we can't
connect to a node (and we're out of auto-retries).ValueError
rather than ElasticSearchError
if no documents
are passed to bulk_index
.ElasticHttpError
.quiet
kwarg, meaning we always expose errors.Other changes:
close_index
, open_index
, update_settings
, health
.datetime
objects when encoding JSON.timeout
.Initial release based on the work of Robert Eanes and other authors
FAQs
Flexible, high-scale API to elasticsearch
We found that pyelasticsearch demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Research
Security News
Eight new malicious Firefox extensions impersonate games, steal OAuth tokens, hijack sessions, and exploit browser permissions to spy on users.
Security News
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.