New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

jsonlitedb

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

jsonlitedb

SQLite3-backed JSON document database with support for indices and advanced queries

0.1.5
PyPI

Maintainers: 1

JSONLiteDB

SQLite3-backed JSON document database with support for indices and advanced queries.

100% Coverage

Premise and Inspiration

JSONLiteDB leverages SQLite3 and JSON1 to create a fast JSON document store with easy persistence, indexing capability, and extensible use.

JSONLiteDB provides an easy API with no need to load the entire database into memory, nor dump it when inserting! JSONLiteDB SQLite files are easily usable in other tools with no proprietary formats or encoding. JSONLiteDB is a great replacement for reading a JSON or JSONLines file. Entries can be modified in place. Queries can be indexed for greatly improved query speed and optionally to enforce uniqueness.

Similar tools and inspiration:

TinyDB. The API and process of TinyDB heavily inspired JSONLiteDB. But TinyDB reads the entire JSON DB into memory and needs to dump the entire database upon insertion. Hardly efficient or scalable and still queries at O(N).
Dataset is promising but creates new columns for every key and is very "heavy" with its dependencies. As far as I can tell, there is no native way to support multi-column and/or unique indexes. But still, a very promising tool!
KenobiDB. Came out while JSONLiteDB was in development. Similar idea with different design decisions. Does not directly support advanced queries indexes (which can greatly accelerate queries!).
- Please correct me if I am wrong. I new to this tool.
DictTable (also written by me) is nice but entirely in-memory and not always efficient for non-equality queries.

Install

From PyPI:

$ pip install jsonlitedb
$ pip install jsonlitedb --upgrade

Or directly from Github

$ pip install git+https://github.com/Jwink3101/jsonlitedb.git

Basic Usage

With some fake data.

>>> from jsonlitedb import JSONLiteDB

>>> db = JSONLiteDB(":memory:")
>>> # more generally:
>>> # db = JSONLiteDB('my_data.db')

Insert some data. Can use insert() with any number of items or insertmany() with an iterable (insertmany([...]) <--> insert(*[...])).

Can also use a context manager (with db: ...)to batch the insertions (or deletions).

>>> db.insert(
>>>     {"first": "John", "last": "Lennon", "born": 1940, "role": "guitar"},
>>>     {"first": "Paul", "last": "McCartney", "born": 1942, "role": "bass"},
>>>     {"first": "George", "last": "Harrison", "born": 1943, "role": "guitar"},
>>>     {"first": "Ringo", "last": "Starr", "born": 1940, "role": "drums"},
>>>     {"first": "George", "last": "Martin", "born": 1926, "role": "producer"},
>>> )

>>> len(db)

>>> list(db)

[{'first': 'John', 'last': 'Lennon', 'born': 1940, 'role': 'guitar'},
 {'first': 'Paul', 'last': 'McCartney', 'born': 1942, 'role': 'bass'},
 {'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
 {'first': 'Ringo', 'last': 'Starr', 'born': 1940, 'role': 'drums'},
 {'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]

Simple Queries

Let's do some simple queries. The default query() returns an iterator so we wrap them in a list.

>>> list(db.query(first="George"))

[{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
 {'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]

>>> list(db.query(first="George", last="Martin"))

[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]

Now let's query with a dictionary to match

>>> list(db.query({"first": "George"}))

[{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
 {'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]

Multiples are always an AND query

>>> list(db.query({"first": "George", "last": "Martin"}))

[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]

Can do seperate items but it makes no difference.

>>> list(db.query({"first": "George"}, {"last": "Martin"}))

[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]

Query Objects

Query objects enable more complex combinations and inequalities. Query objects can be from the database (db.Query or db.Q) or created on thier own (Query() or Q()). They are all the same.

>>> list(db.query(db.Q.first == "George"))

[{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
 {'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]

Note that you need to be careful with parentheses as the operator precedance for the & and | are very high

>>> list(db.query((db.Q.first == "George") & (db.Q.last == "Martin")))

[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]

Can do inequalities too

>>> list(db.query(db.Q.born < 1930))

[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]

Queries support: ==, !=, <, <=, >, >= for normal comparisons.

In addition they support

% : LIKE
* : GLOB
@ : REGEXP using Python's regex module

>>> # This will all be the same
>>> db.query(db.Q.role % "prod%").all()  # LIKE
>>> db.query(db.Q.role * "prod*").all()  # GLOB
>>> db.query(db.Q.role @ "prod").all()  # REGEXP -- Python based

[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]

Speeding up queries

Queries can be greatly accelerated with an index. Note that SQLite is extremely picky about how you write the index! For the most part, if you the same method to query as write the index, you will be fine. (This is more of an issue with nested queries and advanced formulating of the query).

The name of the index is imaterial. It is based on the fields. It will look different

>>> db.create_index("last")
>>> db.indexes

{'ix_items_1bd45eb5': ['$."last"']}

>>> # of course, with four items, this makes little difference
>>> list(db.query(last="Martin"))

[{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]

And an index can also be used to enforce uniqueness amongst one or more fields

>>> db.create_index("first", "last", unique=True)
>>> db.indexes

{'ix_items_1bd45eb5': ['$."last"'],
 'ix_items_250e4243_UNIQUE': ['$."first"', '$."last"']}

>>> # db.insert({'first': 'George', 'last': 'Martin', 'type':'FAKE ENTRY'})
>>> # Causes: IntegrityError: UNIQUE constraint failed: index 'ix_items_250e4243_UNIQUE'

See Advanced Usage for more examples including nested queries

>>>

Queries and Paths

Queries are detailed in the db.query() method. All queries and paths can take four basic forms, but query objects are, by far, the most versatile.

Type	Path (e.g. `create_index()`)	Query (e.g. `query()`)	Comments
Plain string	`'itemkey'`	`{'itemkey':'query_val'}`	Limited to a single item
JSON Path string	`'$.itemkey'` `'$.itemkey.subkey'` `'$.itemkey[4]'` `'$.itemkey.subkey[4]'`	`{'$.itemkey':'query_val'}` `{'$.itemkey.subkey':'query_val'}` `{'$.itemkey[4]':'query_val'}` `{'$.itemkey.subkey[4]':'query_val'}`	Be careful about indices on JSON path strings. See more below
Tuples (or lists)	`('itemkey',)` `('itemkey','subkey')` `('itemkey',4)` `('itemkey','subkey',4)`	`{('itemkey',):'query_val'}` `{('itemkey','subkey'):'query_val'}` `{('itemkey',4):'query_val'}` `{('itemkey','subkey',4):'query_val'}`
Query Objects. (Let `db` be your database)	`db.Q.itemkey` `db.Q.itemkey.subkey` `db.Q.itemkey[4]` `db.Q.itemkey.subkey[4]`	`db.Q.itemkey == 'query_val'` `db.Q.itemkey.subkey == 'query_val'` `db.Q.itemkey[4] == 'query_val'` `db.Q.itemkey.subkey[4] == 'query_val'`	See below. Can also do many more types of comparisons beyond equality

Note that JSON Path strings presented here are unquoted, but all other methods will quote them. For example, '$.itemkey.subkey' and ('itemkey','subkey') are functionally identical; the latter becomes '$."itemkey"."subkey"'. While they are functionally the same, an index created on one will not be used on the other.

Query Objects

Query Objects provide a great deal more flexibility than other forms.

They can handle normal equality == but can handle inequalities, including !=, <, <=, >, >=.

db.Q.item < 10  
db.Q.other_item > 'bla'

They can also handle logic. Note that you must be very careful about parentheses.

(db.Q.item < 10) & (db.Q.other_item > 'bla') # AND  
(db.Q.item < 10) | (db.Q.other_item > 'bla') # OR

Note that while something like 10 <= var <= 20 is valid Python, a query must be done like:

(10 <= db.Q.var) & (db.Q.var <= 20 )

And, as noted in "Basic Usage," they can do SQL LIKE comparisons (db.Q.key % "%Val%"), GLOB comparisons (db.Q.key * "file*.txt"), and REGEXP comparisons (db.Q.key @ "\S+?\.[A-Z]").

Form

You can mix and match index or attribute for keys. The following are all identical:

db.Q.itemkey.subkey
db.Q['itemkey'].subkey
db.Q['itemkey','subkey']
db.Q['itemkey']['subkey']
...

Command Line Tools

JSONLiteDB also installs a tool called "jsonlitedb" that makes it easy to read JSONL and JSON files into a database. This is useful for converting existing databases or appending data.

$ jsonlitedb insert mydb.db newfile.jsonl  
$ cat newdata.jsonl | jsonlitedb insert mydb.db

It can also dump a database to JSONL.

$ jsonlitedb dump mydb.db    # stdout  
$ jsonlitedb dump mydb.db --output db.jsonl

Known Limitations

Dictionary keys must be strings without a dot, double quote, square bracket, and may not start with _. (Some of these may work but could have unexpected outcomes.)
There is no distinction made between an entry having a key with a value of None vs. not having the key. However, you can use query_by_path_exists() to query items that have a certain path. There is no way still to mix this with other queries testing existence other than with None.
While it will accept items like strings as a single item, queries on these do not work reliably.

FAQs

Wouldn't it be better to use different SQL columns rather than all as JSON?

Yes and no. The idea is the complete lack of schema needed and as a notable improvement to a JSON file. Plus, if you index the field of interest, you get super-fast queries all the same!

FAQs

What is jsonlitedb?

Is jsonlitedb well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install