New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

automata-tools

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

automata-tools

Tools to build automata from your custom rule

  • 2.0.1
  • Source
  • PyPI
  • Socket score

Maintainers
1

Automata Tools

Tools to build automata from your custom rule.

This package provides a set of handy tools to programmatically build automata, so you can build NFA、DFA、MinimizedDFA、WFA from any custom rules.

Usage

Install

conda install -c conda-forge automata-tools # not available yet
# or
pip install automata-tools

Import

See example in examples/NFAfromCustomRule.py

from typing import List
from automata_tools import BuildAutomata, Automata

automata: List[Automata] = []

BuildAutomata

characterStruct

Build simple (0)-[a]->(1) automata

automata.append(BuildAutomata.characterStruct(char))
unionStruct

Build automata that is an "or" of two sub-automata (1)<-[a]-(0)-[b]->(1)

# to match "a|b"
a = automata.pop()
b = automata.pop()
if operator == "|":
    automata.append(BuildAutomata.unionStruct(b, a))
concatenationStruct

Build automata that is an "and" of two sub-automata (0)-[a]->(1)-[b]->(2)

# to match "ab"
a = automata.pop()
b = automata.pop()
automata.append(BuildAutomata.concatenationStruct(b, a))
starStruct

Build automata that looks like the "Kleene closure"

# to match "a*"
if operator == "*":
    a = automata.pop()
    automata.append(BuildAutomata.starStruct(a))
skipStruct

Build automata that looks like the "Kleene closure" but without the loop back (1)<-[ε]-(2), so it only match the token once at most.

# to match "a*"
if operator == "?":
    a = automata.pop()
    automata.append(BuildAutomata.skipStruct(a))
repeatRangeStruct

Build automata that will match the same token for several times (0)-[a]->(1)-[a]->(2)-[a]->(3)

# to match "a{3}"
repeatedAutomata = BuildAutomata.repeatStruct(automata, 3)
repeatStruct

Build automata that will match the same token for n to m times

(0)-[a]->(1)-[a]->(4), (0)-[a]->(2)-[a]->(3)-[a]->(4)

# to match "a{2,3}"
repeatedAutomata = BuildAutomata.repeatRangeStruct(automata, 2, 3)

Automata

See example in features/steps/customRule.py

from automata_tools import DFAFromNFA, Automata

from your_implementation import NFAFromRegex, executor

nfa: Automata = NFAFromRegex().buildNFA(rule)
minDFA: Automata = DFAFromNFA(nfa).getMinimizedDFA()
minDFA.setExecuter(executor)

print(minDFA.execute(someText))

where executor is a function like the one in examples/NFAfromCustomRule.py:

def executor(tokens, startState, finalStates, transitions):
    return True
setExecuter

Set an executor to the automata that can freely use state and transition of the automata, and return a boolean value.

from automata_tools import IAutomataExecutor

defaultExecuter: IAutomataExecutor = lambda tokens, startState, finalStates, transitions: True
minDFA.setExecuter(defaultExecuter)
setTokenizer

Set an tokenizer to the automata that can transform string to list of string token, which will be used by the executer.

minDFA.setExecuter(lambda input: input.split(' ')[::-1])

NFAtoDFA

Make automata state transitions not so ambiguous

nfa = NFAFromRegex().buildNFA(rule)
dfa = NFAtoDFA(nfa)

DFAtoMinimizedDFA

Allow you minify Automata state

nfa = NFAFromRegex().buildNFA(rule)
minDFA = DFAtoMinimizedDFA(NFAtoDFA(nfa))

Weighted Finite Automata

WFA, it can execute automata use matrix multiplication, so it can be very fast compare to brute force execution, especially when state space is large.

from automata_tools import WFA, get_word_to_index

_, wordToIndex = get_word_to_index([ruleParser(context.rule), tokenizer(text)])
wfa = WFA(minDFA, wordToIndex, dfa_to_tensor)
wfa.execute(text)
get_word_to_index

Given [['token', 'another'], ['token_in_rule']], return something like

{'token': 0, 'another': 1, ...}

So we can translate automata state to a matrix.

WFA

Given an automata, a word index like {'token': 0, 'another': 1, ...}, and a function that transform automata to tensor (see example at customRuleDFAToTensor), return a WFA instance.

Development

Environment

Create environment from the text file:

conda env create --file automataTools-env.txt
conda activate automataTools

Save env file: conda list --explicit > automataTools-env.txt

Python Path

Create a .env file with content PYTHONPATH=automataTools

Publish

To pypi

rm -rf ./dist && rm -rf ./build && rm -rf ./automata_tools.egg-info && python3 setup.py sdist bdist_wheel && twine upload dist/*

To Conda

# I'm learning how to do...

Resources

Automata Theory Course Slides

Probably the original reference source

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc