Security News
Node.js EOL Versions CVE Dubbed the "Worst CVE of the Year" by Security Experts
Critics call the Node.js EOL CVE a misuse of the system, sparking debate over CVE standards and the growing noise in vulnerability databases.
Instead of using latexcodec, I encourage you to consider pylatexenc instead, which is far superior: https://github.com/phfaist/pylatexenc
Documentation: http://latexcodec.readthedocs.org/
Development: http://github.com/mcmtroffaes/latexcodec/
.. |ci| image:: https://github.com/mcmtroffaes/latexcodec/actions/workflows/python-package.yml/badge.svg :target: https://github.com/mcmtroffaes/latexcodec/actions/workflows/python-package.yml :alt: ci
.. |codecov| image:: https://codecov.io/gh/mcmtroffaes/latexcodec/branch/develop/graph/badge.svg :target: https://codecov.io/gh/mcmtroffaes/latexcodec :alt: codecov
The codec provides a convenient way of going between text written in LaTeX and unicode. Since it is not a LaTeX compiler, it is more appropriate for short chunks of text, such as a paragraph or the values of a BibTeX entry, and it is not appropriate for a full LaTeX document. In particular, its behavior on the LaTeX commands that do not simply select characters is intended to allow the unicode representation to be understandable by a human reader, but is not canonical and may require hand tuning to produce the desired effect.
The encoder does a best effort to replace unicode characters outside of the range used as LaTeX input (ascii by default) with a LaTeX command that selects the character. More technically, the unicode code point is replaced by a LaTeX command that selects a glyph that reasonably represents the code point. Unicode characters with special uses in LaTeX are replaced by their LaTeX equivalents. For example,
====================== ===================
original text encoded LaTeX
====================== ===================
¥
\yen
ü
\"u
\N{NO-BREAK SPACE}
~
~
\textasciitilde
%
\%
#
\#
\textbf{x}
\textbf{x}
====================== ===================
The decoder does a best effort to replace LaTeX commands that select characters with the unicode for the character they are selecting. For example,
===================== ======================
original LaTeX decoded unicode
===================== ======================
\yen
¥
\"u
ü
~
\N{NO-BREAK SPACE}
\textasciitilde
~
\%
%
\#
#
\textbf{x}
\textbf {x}
#
#
===================== ======================
In addition, comments are dropped (including the final newline that marks the end of a comment), paragraphs are canonicalized into double newlines, and other newlines are left as is. Spacing after LaTeX commands is also canonicalized.
For example,
::
hi % bye there\par world \textbf {awesome}
is decoded as
::
hi there
world \textbf {awesome}
When decoding, LaTeX commands not directly selecting characters (for example, macros and formatting commands) are passed through unchanged. The same happens for LaTeX commands that select characters but are not yet recognized by the codec. Either case can result in a hybrid unicode string in which some characters are understood as literally the character and others as parts of unexpanded commands. Consequently, at times, backslashes will be left intact for denoting the start of a potentially unrecognized control sequence.
Given the numerous and changing packages providing such LaTeX commands, the codec will never be complete, and new translations of unrecognized unicode or unrecognized LaTeX symbols are always welcome.
FAQs
A lexer and codec to work with LaTeX code in Python.
We found that latexcodec demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Critics call the Node.js EOL CVE a misuse of the system, sparking debate over CVE standards and the growing noise in vulnerability databases.
Security News
cURL and Go security teams are publicly rejecting CVSS as flawed for assessing vulnerabilities and are calling for more accurate, context-aware approaches.
Security News
Bun 1.2 enhances its JavaScript runtime with 90% Node.js compatibility, built-in S3 and Postgres support, HTML Imports, and faster, cloud-first performance.