Automata Tools
Tools to build automata from your custom rule.
This package provides a set of handy tools to programmatically build automata, so you can build NFA、DFA、MinimizedDFA、WFA from any custom rules.
Usage
Install
conda install -c conda-forge automata-tools # not available yet
# or
pip install automata-tools
Import
See example in examples/NFAfromCustomRule.py
from typing import List
from automata_tools import BuildAutomata, Automata
automata: List[Automata] = []
BuildAutomata
characterStruct
Build simple (0)-[a]->(1)
automata
automata.append(BuildAutomata.characterStruct(char))
unionStruct
Build automata that is an "or" of two sub-automata (1)<-[a]-(0)-[b]->(1)
a = automata.pop()
b = automata.pop()
if operator == "|":
automata.append(BuildAutomata.unionStruct(b, a))
concatenationStruct
Build automata that is an "and" of two sub-automata (0)-[a]->(1)-[b]->(2)
a = automata.pop()
b = automata.pop()
automata.append(BuildAutomata.concatenationStruct(b, a))
starStruct
Build automata that looks like the "Kleene closure"
if operator == "*":
a = automata.pop()
automata.append(BuildAutomata.starStruct(a))
skipStruct
Build automata that looks like the "Kleene closure" but without the loop back (1)<-[ε]-(2)
, so it only match the token once at most.
if operator == "?":
a = automata.pop()
automata.append(BuildAutomata.skipStruct(a))
repeatRangeStruct
Build automata that will match the same token for several times (0)-[a]->(1)-[a]->(2)-[a]->(3)
repeatedAutomata = BuildAutomata.repeatStruct(automata, 3)
repeatStruct
Build automata that will match the same token for n to m times
(0)-[a]->(1)-[a]->(4), (0)-[a]->(2)-[a]->(3)-[a]->(4)
repeatedAutomata = BuildAutomata.repeatRangeStruct(automata, 2, 3)
Automata
See example in features/steps/customRule.py
from automata_tools import DFAFromNFA, Automata
from your_implementation import NFAFromRegex, executor
nfa: Automata = NFAFromRegex().buildNFA(rule)
minDFA: Automata = DFAFromNFA(nfa).getMinimizedDFA()
minDFA.setExecuter(executor)
print(minDFA.execute(someText))
where executor
is a function like the one in examples/NFAfromCustomRule.py:
def executor(tokens, startState, finalStates, transitions):
return True
setExecuter
Set an executor to the automata that can freely use state and transition of the automata, and return a boolean value.
from automata_tools import IAutomataExecutor
defaultExecuter: IAutomataExecutor = lambda tokens, startState, finalStates, transitions: True
minDFA.setExecuter(defaultExecuter)
setTokenizer
Set an tokenizer to the automata that can transform string to list of string token, which will be used by the executer.
minDFA.setExecuter(lambda input: input.split(' ')[::-1])
NFAtoDFA
Make automata state transitions not so ambiguous
nfa = NFAFromRegex().buildNFA(rule)
dfa = NFAtoDFA(nfa)
DFAtoMinimizedDFA
Allow you minify Automata state
nfa = NFAFromRegex().buildNFA(rule)
minDFA = DFAtoMinimizedDFA(NFAtoDFA(nfa))
Weighted Finite Automata
WFA, it can execute automata use matrix multiplication, so it can be very fast compare to brute force execution, especially when state space is large.
from automata_tools import WFA, get_word_to_index
_, wordToIndex = get_word_to_index([ruleParser(context.rule), tokenizer(text)])
wfa = WFA(minDFA, wordToIndex, dfa_to_tensor)
wfa.execute(text)
get_word_to_index
Given [['token', 'another'], ['token_in_rule']]
, return something like
{'token': 0, 'another': 1, ...}
So we can translate automata state to a matrix.
WFA
Given an automata, a word index like {'token': 0, 'another': 1, ...}
, and a function that transform automata to tensor (see example at customRuleDFAToTensor), return a WFA instance.
Development
Environment
Create environment from the text file:
conda env create --file automataTools-env.txt
conda activate automataTools
Save env file: conda list --explicit > automataTools-env.txt
Python Path
Create a .env
file with content PYTHONPATH=automataTools
Publish
To pypi
rm -rf ./dist && rm -rf ./build && rm -rf ./automata_tools.egg-info && python3 setup.py sdist bdist_wheel && twine upload dist/*
To Conda
# I'm learning how to do...
Resources
Automata Theory Course Slides
Probably the original reference source