multi-choices-parser

An efficient C++ incremental parser for multi-choices grammars (generalization of the trie structure) with Python bindings.

0.10.0

PyPI

Maintainers: 1

Multi-choices Parser

Overview

Multi-choices Parser is a C++ efficient incremental parser for multi-choices grammars with Python bindings (3.8+). These grammars are defined as a composition of lists of choices, where each choice is a literal string and can possibly be empty (grammar form below). This parser is optimized for scenarios where the size of the lists of choices is very large, such as representing entities preceded by a determiner.

Here is the type of grammar handled by this parser:

start: list1 list2 ... listn
list1: choice1_1 | choice1_2 | ... | choice1_k1
list2: choice2_1 | choice2_2 | ... | choice2_k2
...
listn: choicem_1 | choicem_2 | ... | choicem_kn

This parser is a generalization of tries, and more precisely a concatenation of tries. In fact, it is equivalent to a trie when $n=1$.

Installation

pip install multi-choices-parser

Features

Handle large lists of choices efficiently (e.g. millions of choices).
Incremental parsing: Each node and its transitions can be accessed at any moment of the parsing.
Extensive testing
Support for all Python versions >=3.8
Support for Linux, Windows and MacOS

Usage

To use the MultiChoicesParser, follow these steps:

Initialize the parser with a list of choices.
Use the step method to feed characters to the parser.
Check the success flag to determine if the parsed string is correct after feeding the End symbol.
Reset the parser state using the reset method if needed.

Example


from multi_choices_parser.parser import MultiChoicesParser, DEFAULT_END_SYMB

# Define your list of choices
l = [
    ['the', 'an', "a", ""],
    ['orange', 'apple', 'banana']
]

# Initialize the parser
p = MultiChoicesParser(l)

# Parse a string (don't forget to add the End symbol)
for i, c in enumerate(tuple("anapple") + (DEFAULT_END_SYMB, )):
    print('Step %s' % i)
    print("Authorized characters:", sorted(p.next()))
    print('Adding character:', c)
    p.step(c)
    print("State: Finished=%s, Success=%s" % (p.finished, p.success))
    print()

Example Output

Step 0
Authorized characters: ['a', 'b', 'o', 't']
Adding character: a
State: Finished=False, Success=False

Step 1
Authorized characters: ['a', 'b', 'n', 'o', 'p']
Adding character: n
State: Finished=False, Success=False

Step 2
Authorized characters: ['a', 'b', 'o']
Adding character: a
State: Finished=False, Success=False

Step 3
Authorized characters: ['p']
Adding character: p
State: Finished=False, Success=False

Step 4
Authorized characters: ['p']
Adding character: p
State: Finished=False, Success=False

Step 5
Authorized characters: ['l']
Adding character: l
State: Finished=False, Success=False

Step 6
Authorized characters: ['e']
Adding character: e
State: Finished=False, Success=False

Step 7
Authorized characters: [End]
Adding character: End
State: Finished=True, Success=True

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For any queries or bug reports, please open an issue on the GitHub repository ;)

FAQs

What is multi-choices-parser?

Is multi-choices-parser well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install