
Security News
New Website “Is It Really FOSS?” Tracks Transparency in Open Source Distribution Models
A new site reviews software projects to reveal if they’re truly FOSS, making complex licensing and distribution models easy to understand.
.. image:: https://img.shields.io/travis/Kapiche/caterpillar.svg?style=flat-square :target: https://travis-ci.org/Kapiche/caterpillar .. image:: https://img.shields.io/coveralls/Kapiche/caterpillar.svg?style=flat-square :target: https://coveralls.io/r/Kapiche/caterpillar
Caterpillar is a pure Python text indexing and analytics library. It is aimed at supporting advanced text and other semi-structured analytics applications that bridge Natural Language Processing (NLP), Information Retrieval and Topic Modelling.
Some features include:
Quick example of using caterpillar below::
import os
import tempfile
from caterpillar.processing.index import IndexWriter, IndexConfig
from caterpillar.processing.schema import TEXT, Schema, NUMERIC
from caterpillar.storage.sqlite import SqliteStorage
index_dir = os.path.join(tempfile.mkdtemp(), "examples")
with open('caterpillar/test_resources/moby.txt', 'r') as f:
data = f.read()
with IndexWriter(index_dir, IndexConfig(SqliteStorage, Schema(text=TEXT, some_number=NUMERIC))) as writer:
writer.add_document(text=data, some_number=1)
.. code::
pip install caterpillar
The documentation can be found here <http://caterpillar.readthedocs.org/en/latest/>
_.
Caterpillar is currently in a volatile state, and the next releases are likely to undergo substantial changes in the API.
In particular we plan to:
Caterpillar now targets Python 3 only. It is intended to support the two most recent releases of Python for new features. Currently this means we aim to support Python 3.5 and 3.6 only.
Kris Rogers <https://github.com/krisrogers/>
_Ryan Stuart <https://github.com/rstuart85/>
_Sam Hames <https://github.com/SamHames/>
_Anyone who is willing! In other words none yet, but we are more then accepting of contributions.
No code will be merged unless it has 100% test coverage and passes the flake8 linting. We code with a line length of 120 characters (see tox.ini [pep8] section) and we use py.test <http://pytest.org/>
_ for testing. Tests are in a testsub-folder in each package. Tox is configured to run the test suite, reporting unit test passes, coverage and linting
automatically.
.. code::
# Run the whole test suite:
tox
# Run just the linting checks by specifying a specific test environment:
tox -e flake8
# Pass some arguments to py.test through the tox runner (in this case run only a specific set of tests)
tox -e py35 -- -k test_index
Caterpillar is copyright © 2013 - 2015 Kapiche Limited. It is licensed under the GNU Affero General Public License.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
The copyright holders grant you an additional permission under Section 7 of the GNU Affero General Public License, version 3, exempting you from the requirement in Section 6 of the GNU General Public License, version 3, to accompany Corresponding Source with Installation Information for the Program or any work based on the Program. You are still required to comply with all other Section 6 requirements to provide Corresponding Source.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
FAQs
Text retrieval and analytics engine.
We found that caterpillar demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
A new site reviews software projects to reveal if they’re truly FOSS, making complex licensing and distribution models easy to understand.
Security News
Astral unveils pyx, a Python-native package registry in beta, designed to speed installs, enhance security, and integrate deeply with uv.
Security News
The Latio podcast explores how static and runtime reachability help teams prioritize exploitable vulnerabilities and streamline AppSec workflows.