
Security News
Browserslist-rs Gets Major Refactor, Cutting Binary Size by Over 1MB
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Python virtual filesystem for SQLite to read from and write to memory.
While SQLite supports the special filename :memory:
that allows the creation of empty databases in memory, sqlite_deserialize
allows the population of an in-memory database from raw bytes of a serialized database, and sqlite_serialize
allows the extraction of the raw bytes of an in-memory database, there are limitations.
sqlite_deserialize
cannot populate a database from non-contiguous raw bytes.sqlite_serialize
cannot serialize to non-contiguous bytes.This virtual filesystem overcomes these limitations. Specifically it allows larger databases to be downloaded and queried without hitting disk, and it allows larger databases to be generated and uploaded without hitting disk.
Based on simonwo's gist and uktrade's sqlite-s3vfs, and inspired by phiresky's sql.js-httpvfs, dacort's Stack Overflow answer and michalc's sqlite-s3-query.
sqlite-memory-vfs can be installed from PyPI using pip
.
pip install sqlite-memory-vfs
This will automatically install APSW along with any other dependencies.
This library allows the raw bytes of a SQLite database to be queried without having to save it to disk. This can be done by using the deserialize_iter
method of MemoryVFS
, passing it an iterable of bytes
instances that contain the SQLite database.
from contextlib import closing
import apsw
import httpx
import sqlite_memory_vfs
memory_vfs = sqlite_memory_vfs.MemoryVFS()
# Any iterable of bytes can be used. In this example, they come via HTTP
url = "https://data.api.trade.gov.uk/v1/datasets/uk-trade-quotas/versions/v1.0.366/data?format=sqlite"
with \
httpx.stream("GET", url) as r, \
closing(apsw.Connection('quota_balances.sqlite', vfs=memory_vfs.name)) as db:
memory_vfs.deserialize_iter(db, r.iter_bytes())
cursor = db.cursor()
cursor.execute('SELECT * FROM quotas;')
print(cursor.fetchall())
If the deserialize_iter
step is ommitted an empty database is automatically created in memory.
See the APSW documentation for more usage examples.
The bytes corresponding to each SQLite database in the VFS can be extracted with the serialize_iter
function, which returns an iterable of bytes
with \
open('my_db.sqlite', 'wb') as f, \
closing(apsw.Connection('quota_balances.sqlite', vfs=memory_vfs.name)) as db:
for chunk in memory_vfs.serialize_iter(db):
f.write(chunk)
It should be safe for any number of readers and writers to attempt to access the database - locking is implemented by the VFS which blocks access to the database when a write in in-flight.
If connection gets blocked, then it will raise apsw.BusyError
. This is normal SQLite behaviour. You can request that SQLite retry certain actions automatically for a period of time to try to reduce the chance that this surfaces to your code. This can be done by setting a busy timeout, for example to set a 500 millisecond timeout:
PRAGMA busy_timeout = 500;
Under the hood writer starvation is avoided by the use of a PENDING lock, much like the default SQLite VFS that writes to disk.
sqlite_deserialize
The main reason for using sqlite-memory-vfs over sqlite_deserialize
is the lower memory usage for larger databases. For example the following may not even complete due to not being able to allocate enough contiguous memory for the database:
import resource
from contextlib import closing
import apsw
import httpx
url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"
with closing(apsw.Connection(':memory:')) as db:
db.deserialize('main', httpx.get(url).read())
cursor = db.cursor()
cursor.execute('SELECT * FROM measures;')
print(cursor.fetchall())
print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
But the following does / should output a lower value of memory usage:
import resource
from contextlib import closing
import apsw
import httpx
import sqlite_memory_vfs
url = "https://data.api.trade.gov.uk/v1/datasets/uk-tariff-2021-01-01/versions/v4.0.46/data?format=sqlite"
memory_vfs = sqlite_memory_vfs.MemoryVFS()
with httpx.stream("GET", url) as r:
memory_vfs.deserialize_iter('tariff.sqlite', r.iter_bytes())
with closing(apsw.Connection('tariff.sqlite', vfs=memory_vfs.name)) as db:
cursor = db.cursor()
cursor.execute('SELECT count(*) FROM measures;')
print(cursor.fetchall())
print('Max memory usage:', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
The tests require the dev dependencies installed
pip install -e ".[dev]"
and can then run with pytest
pytest
FAQs
Python writable in-memory virtual filesystem for SQLite
We found that sqlite-memory-vfs demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Research
Security News
Eight new malicious Firefox extensions impersonate games, steal OAuth tokens, hijack sessions, and exploit browser permissions to spy on users.
Security News
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.