Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

corpuscula

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

corpuscula

Toolkit that simplifies corpus processing

  • 1.0.56
  • PyPI
  • Socket score

Maintainers
1
RuMor: Russian Morphology project

Corpuscula: a python NLP library for corpus processing

PyPI Version Python Version License: BSD-3

A part of RuMor project. It contains tools to simplify corpus processing. Highlights are:

  • full CoNLL-U support (includes CoNLL-U Plus)
  • wrappers for known corpora of Russian language
  • parser and wrapper for Russian part of Wikipedia
  • Corpus Dictionary that can be used for further morphology processing
  • simple database to keep named entities

Installation

pip

Corpuscula supports Python 3.5 or later. To install it via pip, run:

$ pip install corpuscula

If you currently have a previous version of Corpuscula installed, use:

$ pip install corpuscula -U

From Source

Alternatively, you can also install Corpuscula from source of this git repository:

$ git clone https://github.com/fostroll/corpuscula.git
$ cd corpuscula
$ pip install -e .

This gives you access to examples and data that are not included to the PyPI package.

Setup

After installation, you need to specify a directory where you prefer to store downloaded corpora:

>>> import corpuscula.corpus_utils as cu
>>> cu.set_root_dir(<path>)  # We will keep corpora here

NB: it will create/update config file .rumor in your home directory.

If you won't set the root directory, Corpuscula will keep corpora in the directory where it's installed.

Usage

CoNLL-U Support

Management of Corpora

Wrapper for Wikipedia

Corpus Dictionary

Utilities

Items database

Examples

You can find examples in the directory examples of our Corpuscula github repository.

License

Corpuscula is released under the BSD License. See the LICENSE file for more details.

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc