
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Recursive descent parsing library for Python based on functional combinators.
The primary focus of funcparserlib
is parsing little languages or external DSLs (domain specific languages).
Parsers made with funcparserlib
are pure-Python LL(*) parsers. It means that it's very easy to write parsers without thinking about lookaheads and other hardcore parsing stuff. However, recursive descent parsing is a rather slow method compared to LL(k) or LR(k) algorithms. Still, parsing with funcparserlib
is at least twice faster than PyParsing, a very popular library for Python.
The source code of funcparserlib
is only 1.2K lines of code, with lots of comments. Its API is fully type hinted. It features the longest parsed prefix error reporting, as well as a tiny lexer generator for token position tracking.
The idea of parser combinators used in funcparserlib
comes from the Introduction to Functional Programming course. We have converted it from ML into Python.
You can install funcparserlib
from PyPI:
$ pip install funcparserlib
There are no dependencies on other libraries.
funcparserlib
There are several examples available in the tests/
directory:
See also the changelog.
Let's consider a little language of numeric expressions with a syntax similar to Python expressions. Here are some expression strings in this language:
0
1 + 2 + 3
-1 + 2 ** 32
3.1415926 * (2 + 7.18281828e-1) * 42
Here is the complete source code of the tokenizer and the parser for this language written using funcparserlib
:
from typing import List, Tuple, Union
from dataclasses import dataclass
from funcparserlib.lexer import make_tokenizer, TokenSpec, Token
from funcparserlib.parser import tok, Parser, many, forward_decl, finished
@dataclass
class BinaryExpr:
op: str
left: "Expr"
right: "Expr"
Expr = Union[BinaryExpr, int, float]
def tokenize(s: str) -> List[Token]:
specs = [
TokenSpec("whitespace", r"\s+"),
TokenSpec("float", r"[+\-]?\d+\.\d*([Ee][+\-]?\d+)*"),
TokenSpec("int", r"[+\-]?\d+"),
TokenSpec("op", r"(\*\*)|[+\-*/()]"),
]
tokenizer = make_tokenizer(specs)
return [t for t in tokenizer(s) if t.type != "whitespace"]
def parse(tokens: List[Token]) -> Expr:
int_num = tok("int") >> int
float_num = tok("float") >> float
number = int_num | float_num
expr: Parser[Token, Expr] = forward_decl()
parenthesized = -op("(") + expr + -op(")")
primary = number | parenthesized
power = primary + many(op("**") + primary) >> to_expr
term = power + many((op("*") | op("/")) + power) >> to_expr
sum = term + many((op("+") | op("-")) + term) >> to_expr
expr.define(sum)
document = expr + -finished
return document.parse(tokens)
def op(name: str) -> Parser[Token, str]:
return tok("op", name)
def to_expr(args: Tuple[Expr, List[Tuple[str, Expr]]]) -> Expr:
first, rest = args
result = first
for op, expr in rest:
result = BinaryExpr(op, result, expr)
return result
Now, consider this numeric expression: 3.1415926 * (2 + 7.18281828e-1) * 42
.
Let's tokenize()
it using the tokenizer we've created with funcparserlib.lexer
:
[
Token('float', '3.1415926'),
Token('op', '*'),
Token('op', '('),
Token('int', '2'),
Token('op', '+'),
Token('float', '7.18281828e-1'),
Token('op', ')'),
Token('op', '*'),
Token('int', '42'),
]
Let's parse()
these tokens into an expression tree using our parser created with funcparserlib.parser
:
BinaryExpr(
op='*',
left=BinaryExpr(
op='*',
left=3.1415926,
right=BinaryExpr(op='+', left=2, right=0.718281828),
),
right=42,
)
Learn how to write this parser using funcparserlib
in the Getting Started guide!
Some open-source projects that use funcparserlib
as an explicit dependency:
>= 1.0.0a0
, Python 3.7+*
. Python 3 in Docker==0.3.6
, Python 2 and 3>= 1.0.0a0
, Python 3.7+Read the Getting Started guide to start learning funcparserlib
.
FAQs
Recursive descent parsing library based on functional combinators
We found that funcparserlib demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.