Product
Introducing License Enforcement in Socket
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
|travisci build status|_ |appveyor build status|_
sqlitefts-python provides binding for tokenizer of SQLite Full-Text search(FTS3/4)
_ and FTS5
_. it allows you to write tokenizers in Python.
SQLite has Full-Text search feature FTS3/FTS4 and FTS5 along with some predefined tokenizers for FTS3/4
, and also predefined tokenizers for FTS5
.
It is easy to use and has enough functionality. Python has a built-in SQLite module,
so that it is easy to use and deploy. You don't need anything else to full-text search.
But... the predefined tokenizers are not enough for some languages including Japanese. Also it is not easy to write own tokenizers. This module provides ability to write tokenizers using Python with CFFI_, so that you don't need C compiler to write your tokenizer.
It also has ranking functions based on peewee
_, utility function to add FTS5 auxiliary functions, and an FTS5 aux function implementation.
NOTE: all connections using this modules should be explicitly closed. due to GC behavior, it can be crashed if a connection is left open when a program terminated.
There are differences between FTS3/4 and FTS5, so 2 different base classes are defined.
FTS3/4::
import sqlitefts as fts
class SimpleTokenizer(fts.Tokenizer): _p = re.compile(r'\w+', re.UNICODE)
def tokenize(self, text):
for m in self._p.finditer(text):
s, e = m.span()
t = text[s:e]
l = len(t.encode('utf-8'))
p = len(text[:s].encode('utf-8'))
yield t, p, p + l
tk = sqlitefts.make_tokenizer_module(SimpleTokenizer()) fts.register_tokenizer(conn, 'simple_tokenizer', tk)
FTS5::
from sqlitefts import fts5
class SimpleTokenizer(fts5.FTS5Tokenizer): _p = re.compile(r'\w+', re.UNICODE)
def tokenize(self, text, flags=None):
for m in self._p.finditer(text):
s, e = m.span()
t = text[s:e]
l = len(t.encode('utf-8'))
p = len(text[:s].encode('utf-8'))
yield t, p, p + l
tk = fts5.make_fts5_tokenizer(SimpleTokenizer()) fts5.register_tokenizer(conn, 'simple_tokenizer', tk)
Python 2.7, Python 3.3+, and PyPy2.7, PyPy3.2+
CFFI_
FTS3/4 and/or FTS5 enabled SQLite3 or APSW_ (for Windows, you may need to download and replace sqlite3.dll)
Note for APSW users: An APSW Amalgamation build does not expose SQLite APIs used in this module, so libsqlite3.so/sqlite3.dll is also required even it has no runtime library dependencies on SQLite. An APSW local build already depends on the shared library. Detail: sqlite3_db_config can be invoked via Connection.config, but it rejects SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER to register a new tokenizer. tested at APSW 3.21.0-r1.
This software is released under the MIT License, see LICENSE.
.. _SQLite Full-Text search(FTS3/4): https://www.sqlite.org/fts3.html .. _FTS5: https://www.sqlite.org/fts5.html .. _predefined tokenizers for FTS3/4: https://www.sqlite.org/fts3.html#tokenizer .. _predefined tokenizers for FTS5: https://www.sqlite.org/fts5.html#section_4_3 .. _peewee: https://github.com/coleifer/peewee .. _CFFI: https://cffi.readthedocs.io/en/latest/ .. _ctypes: https://docs.python.org/library/ctypes.html .. |travisci build status| image:: https://api.travis-ci.org/hideaki-t/sqlite-fts-python.svg?branch=master .. _travisci build status: https://travis-ci.org/hideaki-t/sqlite-fts-python .. |appveyor build status| image:: https://ci.appveyor.com/api/projects/status/github/hideaki-t/sqlite-fts-python?svg=true .. _appveyor build status: https://ci.appveyor.com/project/hideaki-t/sqlite-fts-python .. _APSW: https://github.com/rogerbinns/apsw
FAQs
A Python binding for tokenizers of SQLite Full Text Search
We found that sqlitefts demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Product
We're launching a new set of license analysis and compliance features for analyzing, managing, and complying with licenses across a range of supported languages and ecosystems.
Product
We're excited to introduce Socket Optimize, a powerful CLI command to secure open source dependencies with tested, optimized package overrides.