pysolaar

Dependencies

Maintainers

Alerts

File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

pysolaar

Opinionated wrapper round PySolr

0.8.0

PyPI

Maintainers: 1

Readme

PySolaar

A highly opinionated, Django-like wrapper round PySolr, for when you want to ridiculously de-normalise some complex data at runtime, and then query it with a pretty interface.

Features

Managing your Solr data

Create document types with a Django-like class-based approach
Define how a document instance should get its data. It's just a function you define, so how you get the data is up to you: from a database, by mushing together data from different Django classes, by HTTP request...
Define a function for PySolaar to call in order to generate the documents...
Declaratively define the structure of your documents, if you want to do any complex embedding or reuse documents.
Press go!
PySolaar automatically encodes, embeds, etc. all the data and pushes it to Solr.
Fields are "namespaced" to document-type (like Python's __name_mangling), so no clashes! The id field's value is also prefixed with the class name, so it's unique for a specific class.

Querying your Solr data

A nice Django QuerySet-like approach (wraps a modified version of SolrQ, which you can also use for more complex queries)
Automatically prefixes all the queries to deal with the name-mangling.
Declaratively define the document you want returned, including a set of transformations for unpacking data, turning dates back into Python datetime objects, or whatever you want.
Lazy query evaluation

Basics

Creating PySolaar Documents

PySolaar allows documents to be defined using Django-like classes, which represent entity types:

Create an entity type, by subclassing PySolaar.
Define a build_document method:
- This is "how you get the data to index".
- This method defines a single document or set of documents that correspond to a single identifier.
- It should take an identifier as an argument, and return a self.Document object or an iterable of self.Document objects.
- Pass your data as key words to self.Document initialiser or unpack a dict.
Define a build_document_set method that iterates through a series of identifiers and returns a call to self.build_document for each identifier.

from pysolaar import PySolaar

# Create a thing that inherits from PySolaar
class Person(PySolaar):

    # Write a `build_document` method -- this gets the data corresponding
    # to a particular value of `identifier`

    def build_document(self, identifier):

        # Return an instance of self.Document containing the data
        return self.Document(
            id="person-{}".format(identifier), 
            name="Claudius the {}".format(identifier), 
            height=100 * identifier,
            moustache=["No", "Yes"][bool(identifier % 2 == 0)] # It's odd not to have a moustache!
        )

    # Write `build_document_set` that produces an iterator of Person.build_document calls
    def build_document_set(self):
        for identifier in [1, 2, 3, 4, 5]:
            yield self.build_document(identifier)


# Configure PySolaar by setting up the underlying PySolr instance
PySolaar.configure_pysolr("http://your-solr-instance")

# Then run PySolaar.update() to push data to Solr
PySolaar.update()

When PySolaar.update() is called, PySolaar goes through all its subclasses' build_document_set functions, in order to generate the documents, and then pushes them to Solr. Obviously, there's a reasonable amount of magic.

Querying the data

PySolaar provides a Django-like interface for querying data. Or just pass a SolrQ object to .filter()

from PySolaar import Q
from __above__ import Person

# Get all the persons
persons = Person.all() 

# Filter by anything ...
claudiuses = Person.filter(name="Claudius")

# ... and chain QuerySets as in Django
tall_with_moustache = claudiuses.filter(height__gt=250, moustache="Yes")

# ... or use a Q object
either_tall_or_moustache = claudiuses.filter( Q(height__gt=250) | Q(moustache="Yes") )

# ... and paginate
first_page = tall_with_moustache.paginate(page_size=2, page_number=0)

# Results aren't evaluated until you need them:
tall_with_moustache.count() # -> 2

for c in first_page:
    print(c["id"])

# And a few other features — see the Advanced section.

Restrictions when defining documents

Most of the restrictions here stem from the limitations of Solr and the PySolr library.

A single field can be contain:

A value (string, int, datetime, etc.)
A list of values (probably a set as well)
A dict, whose values are either more dicts or values or lists of values. (Dicts are collapsed down to single value fields using double underscores, i.e. field={"one": "something", "two": "something else"} becomes field__one="something" and field__two="something else" — to an arbitrary depth!)
NOT a list of dicts. To index a list of associated values (e.g. lists of dicts), instead use Child Documents.

Advanced features

The `Meta` class (as borrowed from Django)

Each class can define a Meta class, which can be used to declaratively define a number of aspects regarding how data is stored in Solr. (The Meta class is passed around in the background to apply settings where appropriate.)

Using the Meta class, you can:

Independently from the data-definition method (build_document), declare a store_document_fields structure, defining which fields should be pushed to Solr and in which format. This allows the build_document method to be a 'generic' method for getting whatever data is required and allows easy embedding and reuse.
Define a return_document_fields structure to limit the fields that are returned from Solr (so you can have fields that are just there for querying, but you're not interested in)
Or (older version, probably will be deprecated), independently define lists of fields such as fields_as_child_docs.

Child documents

The Solr "child field" feature is used to allow one document type to be nested inside another.

To associate a nested document with a particular field, first define a fields_as_child_docs list in the Meta class and add the field name. Then set the value of the parent field to Child.items([identifiers]). See the example below.

from pysolaar import PySolaar


class Person(PySolaar):
    class Meta:
        # Define a Meta class with `fields_as_child_docs`
        # in order to declare a field is a child doc
        fields_as_child_docs = ["pets"]

    def build_document(self, identifier):
        return self.Document(
            id="thing-{}".format(identifier), 
            name="Claudius", 
            # Embed another document type by calling Class.items([identifiers])
            pets=Pets.items([1, 2, 3])
        )

class Pets(PySolaar):

    def build_document(self, identifier):
        return self.Document(
            id="pet-{}".format(identifier), 
            name="Gordon"
        )

In the background, PySolaar.items calls build_document with the listed identifiers, and embeds the document as a child (using PySolr's _doc keyword).

Nesting searchable documents works up to one level of embedding. After this, documents can be stored as JSON strings and recovered automatically, but not queried. After three levels of embedding a particular type, this embedding will stop (preventing infinite recursion).

`store_document_fields`

In general, the build_document function should define all the fields required for every use-case. The results of calling this function with a particular identifier are cached, so the data can be re-used if embedded as a child document elsewhere (n.b. the cache is cleared after an update!)

DocumentFields classes provides a convenient way to describe how the document should be stored in Solr.

Set the store_document_fields value in your class Meta to a DocumentFields instance where you list the fields you want to include (set them to True).

Also use this structure to declare child fields using ChildDocument (in place of using the fields_as_child_docs, as above) and to control which fields in the ChildDocument are included.

This is useful (as in the example below) as we can have Person documents with their pets' names embedded as searchable child documents, but also have more detailed Pets documents (in turn with certain owner fields).

import datetime
from pysolaar import PySolaar, DocumentFields, ChildDocument

class Person(PySolaar):
    class Meta:
        store_document_fields = (
            DocumentFields(
                name=True,
                school=True,
                work=True,
                has_pets=ChildDocument( # Here we can start selecting fields from the Pets class
                    name=True,
                    weight=True,
                    # date_of_birth=True ... we don't care about, so omit it
                    owner=ChildDocument(
                        name=True, # Here, we embed a Person instance again, this time just selecting the name
                                # n.b. this `owner` field will be converted to JSON for storage (see above)
                    ),
                ),
            ),
        )

    def build_document(self, identifier):
        return self.Document(
            id="thing-{}".format(identifier), 
            name="Claudius",
            school="St Somethings",
            work="Bus driver",
            has_pets=Pets.items([1, 2, 3])
        )

class Pets(PySolaar):

    def build_document(self, identifier):
        return self.Document(
            id="pet-{}".format(identifier), 
            name="Gordon"
            weight=123,
            date_of_birth=datetime.datetime(1996, 1, 3),
            # Re-embed the Person as the owner, as it might be useful
            # if we ever want a 'Pets' top-level document.
            owner=Person.items(f"owner_of_{identifier}")
        )

store_document_field allows the following structures:

DocumentFields: this is the root wrapper for the whole document
ChildDocument: embeds another document (with the specified fields) as a Solr child document
JsonChildDocument: embeds a child document by converting it to a JSON string. It can be returned and unpacked back to Python, but not queried in Solr (except as a hit-and-miss string-matching exercise...)
SplattedChildDocument: embeds a child document as a list of searchable fields (it works recursively through the child document as a dict, accumulating all the values in a list). Useful for creating a searchable version of a child document, where you don't care about any field in particular, just matching something. Probably don't return this from Solr — unlike JsonChildDocument it cannot be reverted back to anything particularly useful.

Keywords

FAQs

What is pysolaar?

Is pysolaar well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install