compynator
Advanced tools
@@ -19,3 +19,3 @@ # Copyright 2019 Google LLC | ||
| __version__ = "0.1.0" | ||
| __version__ = "0.1.1" | ||
@@ -22,0 +22,0 @@ from .core import ( |
+8
-1
| Metadata-Version: 2.1 | ||
| Name: compynator | ||
| Version: 0.1.0 | ||
| Version: 0.1.1 | ||
| Summary: A pure Python implementation of parser combinators with asymptotically best performance and support for context-sensitive or ambiguous grammars. | ||
@@ -15,2 +15,3 @@ Home-page: https://github.com/TheTerminalTemplar/compynator | ||
| Classifier: Programming Language :: Python :: 3.8 | ||
| Project-URL: Documentation, https://compynator.readthedocs.io | ||
| Project-URL: Repository, https://github.com/TheTerminalTemplar/compynator | ||
@@ -33,2 +34,8 @@ Description-Content-Type: text/x-rst | ||
| |Read the Docs| | ||
| .. |Read the Docs| image:: https://readthedocs.org/projects/compynator/badge/ | ||
| :target: https://compynator.readthedocs.io/ | ||
| Introduction | ||
@@ -35,0 +42,0 @@ ============ |
+2
-4
| [tool.poetry] | ||
| name = "compynator" | ||
| version = "0.1.0" | ||
| version = "0.1.1" | ||
| description = "A pure Python implementation of parser combinators with asymptotically best performance and support for context-sensitive or ambiguous grammars." | ||
@@ -11,2 +11,3 @@ license = "Apache-2.0" | ||
| keywords = ["parser", "functional", "parser combinator", "python"] | ||
| documentation = "https://compynator.readthedocs.io" | ||
@@ -18,5 +19,2 @@ [tool.poetry.dependencies] | ||
| pytest = "^6.1.2" | ||
| Sphinx = "^3.3.0" | ||
| sphinx-rtd-theme = "^0.5.0" | ||
| sphinx-tabs = "^1.3.0" | ||
@@ -23,0 +21,0 @@ [build-system] |
+6
-0
@@ -15,2 +15,8 @@ ========== | ||
| |Read the Docs| | ||
| .. |Read the Docs| image:: https://readthedocs.org/projects/compynator/badge/ | ||
| :target: https://compynator.readthedocs.io/ | ||
| Introduction | ||
@@ -17,0 +23,0 @@ ============ |
+2
-2
@@ -12,5 +12,5 @@ # -*- coding: utf-8 -*- | ||
| 'name': 'compynator', | ||
| 'version': '0.1.0', | ||
| 'version': '0.1.1', | ||
| 'description': 'A pure Python implementation of parser combinators with asymptotically best performance and support for context-sensitive or ambiguous grammars.', | ||
| 'long_description': '==========\nCompynator\n==========\n\n|Tests|\n\n.. |Tests| image:: https://github.com/TheTerminalTemplar/compynator/workflows/Tests/badge.svg\n :target: https://github.com/TheTerminalTemplar/compynator/actions?workflow=Tests\n\n|PyPi|\n\n.. |PyPi| image:: https://img.shields.io/pypi/v/compynator.svg\n :target: https://pypi.org/project/compynator/\n\nIntroduction\n============\n\nCompynator is a tiny (~400 SLOCs), pure Python implementation of `parser\ncombinators <https://en.wikipedia.org/wiki/Parser_combinator>`_. With this\nlibrary, one can build up a complex parser from primitive parsers such as "get\none token" (the ``One`` parser), or compose parsers together such as "if this\nparser fails, try that parser" (the ``Or`` combinator), etc. The mental model is\na binary tree of execution nodes through which a sequence of input tokens flows.\n\nThe implementation in this package supports optional memoization and curtailment\nas described in [Frost2007]_. This allows a memoized parser to achieve\nasymptotically best performance, and support ambiguous grammars.\n\nCompynator is not an official Google product.\n\nExamples\n--------\n\nAn example that solves\nhttps://leetcode.com/problems/parsing-a-boolean-expression:\n\n.. code-block:: python\n\n t = Terminal(\'t\').value(True)\n f = Terminal(\'f\').value(False)\n\n e = Forward()\n n = Terminal(\'!(\').then(e).value(lambda x: not x).skip(\')\')\n\n a_empty = Succeed(True)\n a_e_tail = Forward()\n a_e_tail.is_(\n Terminal(\',\').then(e).then(\n a_e_tail, lambda x, y: x and y) | a_empty)\n a = Terminal(\'&(\').then(e).then(\n a_e_tail, lambda x, y: x and y).skip(\')\')\n\n o_empty = Succeed(False)\n o_e_tail = Forward()\n o_e_tail.is_(\n Terminal(\',\').then(e).then(\n o_e_tail, lambda x, y: x or y) | o_empty)\n o = Terminal(\'|(\').then(e).then(\n o_e_tail, lambda x, y: x or y).skip(\')\')\n\n e.is_(t | f | n | a | o)\n\n self.assertEqual(e(\'!(f)\'), {Result(True, \'\')})\n self.assertEqual(e(\'|(f,t)\'), {Result(True, \'\')})\n self.assertEqual(e(\'&(t,f)\'), {Result(False, \'\')})\n self.assertEqual(e(\'|(&(t,f,t),!(t))\'), {Result(False, \'\')})\n\nSee `test_benchmark.py <tests/test_benchmark.py>`_ for a literal translation of\nthe ABNF rules for URI parsing as given in [RFC3986]_ into a structured tree::\n\n Uri\n Scheme http\n HierPart\n Authority\n UserInfo None\n Host www.ics.uci.edu\n Port None\n Path /pub/ietf/uri/\n Query query\n Fragment fragment\n\nThere are more examples in the `tests <tests>`_ directory.\n\nTranslating Augmented Backus-Naur Form (ABNF)\n---------------------------------------------\n\nCare should be taken to ensure that ``repeat`` takes all possible results\ninstead of greedily parsing for the most. For example, the ABNF ``*3"a" "aa"``\ncannot be simply translated as ``Terminal(\'a\').repeat(0, 3) + \'aa\'``. This\ntranslation will fail to parse ``aaaa`` because it greedily matches the first\n3 ``a``\'s, then fails to find the remaining 2 ``a``\'s. The correct translation\nshould be ``Terminal(\'a\').repeat(0, 3, take_all=True) + \'aa\'``.\n\n=================== ====================================\n ABNF Compynator\n=================== ====================================\nTerminal ``Terminal``\nRule ``Parser``\nConcatenation ``p1 + p2``\nAlternative ``p1 | p2``\nSequence group Normal use of parentheses\nVariable repetition ``p.repeat(lower, upper, take_all)``\nSpecific repetition ``p.repeat(x, x)``\nOptional ``p.repeat(0, 1, take_all)``\n=================== ====================================\n\nTranslating Parsing Expression Grammar (PEG)\n--------------------------------------------\n\nUnlike ABNF, PEG operators are always greedy. When translating PEG, we do not\nneed to worry about backtracking the repetitions with ``take_all``.\n\n============== ===============================\n PEG Compynator\n============== ===============================\nTerminal ``Terminal``\nNonterminal ``Parser``\nEpsilon ``Empty``\n-------------- -------------------------------\nSequence ``p1 + p2``\nOrdered choice ``p1 | p2``\nZero or more ``p.repeat()``\nOne or more ``p.repeat(1)``\nOptional ``p.repeat(0, 1)``\nAnd predicate ``Lookahead(p)``\nNot predicate ``Lookahead(p, take_if=False)``\n============== ===============================\n\nCombinator vs Generator\n=======================\n\nAdvantages\n----------\n\nAdvantages of parser combinators versus parser generators are:\n\n#. Readability. A grammar can be expressed in a very similar form as its BNF.\n The code can be considered an *executable specification* of the grammar.\n#. Simple setup. The code is the grammar. There is no need to run a generator to\n regenerate code when the grammar changes.\n#. Understandability. Each parser is generally short and simple that its\n correctness can be easily verified. There is no need to look into generated\n code, or the code of the parser generator.\n#. Parser combinators support context-sensitive grammars. For example, to parse\n an XML body, assuming ``start`` parses a start tag, ``body`` parses the body,\n and ``end`` parses a specified end tag:\n\n .. code-block:: python\n\n xml_tag = start.then(lambda tag_name: body.skip(end(tag_name)))\n\n#. Combination of lexing and parsing. Most parser generators perform their\n lexing and parsing phases separately. Parser combinators combine these phases\n together. Hence they are not limited to string inputs. The example (in\n `test_core.py <tests/test_core.py>`_) below takes a tokenized sequence.\n\n .. code-block:: python\n\n NUM, OP, TERMINAL = 0, 1, 2\n tokens = [(NUM, 2), (OP, operator.add), (NUM, 10),\n (OP, operator.mul), (NUM, 4)]\n num = One.where(lambda c: c[0] == NUM)\n op = One.where(lambda c: c[0] == OP).value(lambda c: c[1])\n mult_div = op.where(lambda c: c in (operator.mul, operator.truediv))\n add_sub = op.where(lambda c: c in (operator.add, operator.sub))\n left_paren = One.where(lambda c: c[0] == TERMINAL and c[1] == \'(\')\n right_paren = One.where(lambda c: c[0] == TERMINAL and c[1] == \')\')\n expr = Forward()\n factor = (\n num.value(lambda t: t[1]) |\n left_paren.then(expr).skip(right_paren)\n )\n def do_op(left, op, right):\n return op(left, right)\n term = Forward()\n term.is_((\n Collect(term, mult_div, factor).value(lambda v: do_op(*v)) ^\n factor\n ).memoize())\n expr.is_((\n Collect(expr, add_sub, term).value(lambda v: do_op(*v)) ^\n term\n ).memoize())\n calc = expr.filter(lambda r: not r.remain)\n self.assertEqual(\n set(expr(tokens)),\n {\n Result(value=42, remain=[]),\n Result(value=12, remain=tokens[3:]),\n Result(value=2, remain=tokens[1:]),\n })\n self.assertEqual(calc(tokens), Succeed(42)([]))\n\nDisadvantages\n-------------\n\nDisadvantages of parser combinators are:\n\n#. Familiarity. Most textbooks write about parser generators and traditional\n parsing techniques such as LL, LR, etc. Parser combinators are more common\n in functional and logic programming communities, as popularized by\n [Wadler1985]_ and [Hutton1992]_.\n#. Coupling of code and grammar. The downside of simple setup is a tight\n coupling of code and grammar, which might make it difficult to understand.\n#. As it is implemented here, performance might be impacted due to composition\n overhead. See `test_benchmark.py <tests/test_benchmark.py>`_ for details. On\n the same machine, the result for URI parsing could be ~70 times slower::\n\n t.test_parse_uri() 903.5961110000001 usec per run\n t.test_urlparse() 13.704007000000074 usec per run\n\n#. All the advantages and disadvantages of scannerless parsing apply too.\n\nLimitations\n===========\n\nCurrently, this library does not implement:\n\n#. Source context such as line and column number of the token.\n#. "Greedy" matching in the same sense as in regular expression (i.e. longest\n match). The greedy operation in this library is in the "greedy algorithm"\n sense, i.e. the first rule that matches will be taken.\n#. Space treatments. Spaces have to be explicitly taken care of in grammars.\n\nReferences\n==========\n\n.. [Wadler1985] Wadler, Philip. (1985). "How to replace failure by a list of\n successes". Proc. conference on functional programming and computer\n architecture. Springer–Verlag.\n\n.. [Hutton1992] Hutton, Graham. (1992). "Higher-order functions for parsing".\n Journal of functional programming, 2(3), 323–343.\n\n.. [Frost2007] Frost R.A., Hafiz R., Callaghan P. (2007) "Parser Combinators for\n Ambiguous Left-Recursive Grammars". In: Hudak P., Warren D.S. (eds)\n "Practical Aspects of Declarative Languages". PADL 2008. Lecture Notes in\n Computer Science, vol 4902. Springer, Berlin, Heidelberg\n\n.. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource\n Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.\n', | ||
| 'long_description': '==========\nCompynator\n==========\n\n|Tests|\n\n.. |Tests| image:: https://github.com/TheTerminalTemplar/compynator/workflows/Tests/badge.svg\n :target: https://github.com/TheTerminalTemplar/compynator/actions?workflow=Tests\n\n|PyPi|\n\n.. |PyPi| image:: https://img.shields.io/pypi/v/compynator.svg\n :target: https://pypi.org/project/compynator/\n\n|Read the Docs|\n\n.. |Read the Docs| image:: https://readthedocs.org/projects/compynator/badge/\n :target: https://compynator.readthedocs.io/\n\n\nIntroduction\n============\n\nCompynator is a tiny (~400 SLOCs), pure Python implementation of `parser\ncombinators <https://en.wikipedia.org/wiki/Parser_combinator>`_. With this\nlibrary, one can build up a complex parser from primitive parsers such as "get\none token" (the ``One`` parser), or compose parsers together such as "if this\nparser fails, try that parser" (the ``Or`` combinator), etc. The mental model is\na binary tree of execution nodes through which a sequence of input tokens flows.\n\nThe implementation in this package supports optional memoization and curtailment\nas described in [Frost2007]_. This allows a memoized parser to achieve\nasymptotically best performance, and support ambiguous grammars.\n\nCompynator is not an official Google product.\n\nExamples\n--------\n\nAn example that solves\nhttps://leetcode.com/problems/parsing-a-boolean-expression:\n\n.. code-block:: python\n\n t = Terminal(\'t\').value(True)\n f = Terminal(\'f\').value(False)\n\n e = Forward()\n n = Terminal(\'!(\').then(e).value(lambda x: not x).skip(\')\')\n\n a_empty = Succeed(True)\n a_e_tail = Forward()\n a_e_tail.is_(\n Terminal(\',\').then(e).then(\n a_e_tail, lambda x, y: x and y) | a_empty)\n a = Terminal(\'&(\').then(e).then(\n a_e_tail, lambda x, y: x and y).skip(\')\')\n\n o_empty = Succeed(False)\n o_e_tail = Forward()\n o_e_tail.is_(\n Terminal(\',\').then(e).then(\n o_e_tail, lambda x, y: x or y) | o_empty)\n o = Terminal(\'|(\').then(e).then(\n o_e_tail, lambda x, y: x or y).skip(\')\')\n\n e.is_(t | f | n | a | o)\n\n self.assertEqual(e(\'!(f)\'), {Result(True, \'\')})\n self.assertEqual(e(\'|(f,t)\'), {Result(True, \'\')})\n self.assertEqual(e(\'&(t,f)\'), {Result(False, \'\')})\n self.assertEqual(e(\'|(&(t,f,t),!(t))\'), {Result(False, \'\')})\n\nSee `test_benchmark.py <tests/test_benchmark.py>`_ for a literal translation of\nthe ABNF rules for URI parsing as given in [RFC3986]_ into a structured tree::\n\n Uri\n Scheme http\n HierPart\n Authority\n UserInfo None\n Host www.ics.uci.edu\n Port None\n Path /pub/ietf/uri/\n Query query\n Fragment fragment\n\nThere are more examples in the `tests <tests>`_ directory.\n\nTranslating Augmented Backus-Naur Form (ABNF)\n---------------------------------------------\n\nCare should be taken to ensure that ``repeat`` takes all possible results\ninstead of greedily parsing for the most. For example, the ABNF ``*3"a" "aa"``\ncannot be simply translated as ``Terminal(\'a\').repeat(0, 3) + \'aa\'``. This\ntranslation will fail to parse ``aaaa`` because it greedily matches the first\n3 ``a``\'s, then fails to find the remaining 2 ``a``\'s. The correct translation\nshould be ``Terminal(\'a\').repeat(0, 3, take_all=True) + \'aa\'``.\n\n=================== ====================================\n ABNF Compynator\n=================== ====================================\nTerminal ``Terminal``\nRule ``Parser``\nConcatenation ``p1 + p2``\nAlternative ``p1 | p2``\nSequence group Normal use of parentheses\nVariable repetition ``p.repeat(lower, upper, take_all)``\nSpecific repetition ``p.repeat(x, x)``\nOptional ``p.repeat(0, 1, take_all)``\n=================== ====================================\n\nTranslating Parsing Expression Grammar (PEG)\n--------------------------------------------\n\nUnlike ABNF, PEG operators are always greedy. When translating PEG, we do not\nneed to worry about backtracking the repetitions with ``take_all``.\n\n============== ===============================\n PEG Compynator\n============== ===============================\nTerminal ``Terminal``\nNonterminal ``Parser``\nEpsilon ``Empty``\n-------------- -------------------------------\nSequence ``p1 + p2``\nOrdered choice ``p1 | p2``\nZero or more ``p.repeat()``\nOne or more ``p.repeat(1)``\nOptional ``p.repeat(0, 1)``\nAnd predicate ``Lookahead(p)``\nNot predicate ``Lookahead(p, take_if=False)``\n============== ===============================\n\nCombinator vs Generator\n=======================\n\nAdvantages\n----------\n\nAdvantages of parser combinators versus parser generators are:\n\n#. Readability. A grammar can be expressed in a very similar form as its BNF.\n The code can be considered an *executable specification* of the grammar.\n#. Simple setup. The code is the grammar. There is no need to run a generator to\n regenerate code when the grammar changes.\n#. Understandability. Each parser is generally short and simple that its\n correctness can be easily verified. There is no need to look into generated\n code, or the code of the parser generator.\n#. Parser combinators support context-sensitive grammars. For example, to parse\n an XML body, assuming ``start`` parses a start tag, ``body`` parses the body,\n and ``end`` parses a specified end tag:\n\n .. code-block:: python\n\n xml_tag = start.then(lambda tag_name: body.skip(end(tag_name)))\n\n#. Combination of lexing and parsing. Most parser generators perform their\n lexing and parsing phases separately. Parser combinators combine these phases\n together. Hence they are not limited to string inputs. The example (in\n `test_core.py <tests/test_core.py>`_) below takes a tokenized sequence.\n\n .. code-block:: python\n\n NUM, OP, TERMINAL = 0, 1, 2\n tokens = [(NUM, 2), (OP, operator.add), (NUM, 10),\n (OP, operator.mul), (NUM, 4)]\n num = One.where(lambda c: c[0] == NUM)\n op = One.where(lambda c: c[0] == OP).value(lambda c: c[1])\n mult_div = op.where(lambda c: c in (operator.mul, operator.truediv))\n add_sub = op.where(lambda c: c in (operator.add, operator.sub))\n left_paren = One.where(lambda c: c[0] == TERMINAL and c[1] == \'(\')\n right_paren = One.where(lambda c: c[0] == TERMINAL and c[1] == \')\')\n expr = Forward()\n factor = (\n num.value(lambda t: t[1]) |\n left_paren.then(expr).skip(right_paren)\n )\n def do_op(left, op, right):\n return op(left, right)\n term = Forward()\n term.is_((\n Collect(term, mult_div, factor).value(lambda v: do_op(*v)) ^\n factor\n ).memoize())\n expr.is_((\n Collect(expr, add_sub, term).value(lambda v: do_op(*v)) ^\n term\n ).memoize())\n calc = expr.filter(lambda r: not r.remain)\n self.assertEqual(\n set(expr(tokens)),\n {\n Result(value=42, remain=[]),\n Result(value=12, remain=tokens[3:]),\n Result(value=2, remain=tokens[1:]),\n })\n self.assertEqual(calc(tokens), Succeed(42)([]))\n\nDisadvantages\n-------------\n\nDisadvantages of parser combinators are:\n\n#. Familiarity. Most textbooks write about parser generators and traditional\n parsing techniques such as LL, LR, etc. Parser combinators are more common\n in functional and logic programming communities, as popularized by\n [Wadler1985]_ and [Hutton1992]_.\n#. Coupling of code and grammar. The downside of simple setup is a tight\n coupling of code and grammar, which might make it difficult to understand.\n#. As it is implemented here, performance might be impacted due to composition\n overhead. See `test_benchmark.py <tests/test_benchmark.py>`_ for details. On\n the same machine, the result for URI parsing could be ~70 times slower::\n\n t.test_parse_uri() 903.5961110000001 usec per run\n t.test_urlparse() 13.704007000000074 usec per run\n\n#. All the advantages and disadvantages of scannerless parsing apply too.\n\nLimitations\n===========\n\nCurrently, this library does not implement:\n\n#. Source context such as line and column number of the token.\n#. "Greedy" matching in the same sense as in regular expression (i.e. longest\n match). The greedy operation in this library is in the "greedy algorithm"\n sense, i.e. the first rule that matches will be taken.\n#. Space treatments. Spaces have to be explicitly taken care of in grammars.\n\nReferences\n==========\n\n.. [Wadler1985] Wadler, Philip. (1985). "How to replace failure by a list of\n successes". Proc. conference on functional programming and computer\n architecture. Springer–Verlag.\n\n.. [Hutton1992] Hutton, Graham. (1992). "Higher-order functions for parsing".\n Journal of functional programming, 2(3), 323–343.\n\n.. [Frost2007] Frost R.A., Hafiz R., Callaghan P. (2007) "Parser Combinators for\n Ambiguous Left-Recursive Grammars". In: Hudak P., Warren D.S. (eds)\n "Practical Aspects of Declarative Languages". PADL 2008. Lecture Notes in\n Computer Science, vol 4902. Springer, Berlin, Heidelberg\n\n.. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource\n Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.\n', | ||
| 'author': 'google', | ||
@@ -17,0 +17,0 @@ 'author_email': None, |
Alert delta unavailable
Currently unable to show alert delta for PyPI packages.
75378
0.64%841
0.12%