Socket
Book a DemoInstallSign in
Socket

pandoc_refeq_mathml

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pandoc_refeq_mathml

bundlerRubygems
Version
0.2
Version published
Maintainers
1
Created
Source

= PandocRefeqMathml - ad hoc tool to modify pandoc-converted MathML from LaTeX

== Summary

This Ruby command-line command modifies a MathML file converted with pandoc from LaTeX.

Whereas pandoc is a great text-ish file converter, there are a few caveats, at the time of writing, in converting a LaTeX file to MathML.

A major caveat is the generated MathML does not display the equation numbers that are auto-generated by LaTeX in default for the equation and eqnarray environments, nor their (LaTeX) labels. All the (LaTeX) ref remain as they are, which is a coded message for readers.

Another caveat is the alignments of equations in the eqnarray environment.

This tool is a bit of ad hoc (dirty) hack to correct these points in some basic situations. "Basic" here means just the standard LaTeX commands, not some external package-specific commands.

The full package of this module is found in {PandocRefeqMathml Ruby Gems page}[http://rubygems.org/gems/pandoc_refeq_mathml] (with document created from source annotation with yard) and in {Github}[https://github.com/masasakano/pandoc_refeq_mathml]

== Background and constraints

Pandoc-converted MathML.html from LaTeX lacks equation numbers that are present in the original LaTeX. The {pandoc-crossref}[https://github.com/lierdakil/pandoc-crossref] offers a way to tackle the problem; however its fix is far from perfect with three or four major caveats.

  • A single number is assigned to the whole set of equations in an +eqnarray+ environment, which is inconsistent with LaTeX.
  • The LaTeX +\nonumber+ is not taken into account.
  • Referencing text to an equation displays the original LaTeX label, as opposed to the equation number, which makes no sense to readers.
  • Because of points (1) and (2), the given equation numbers usually do not agree at all with the original document compiled by LaTeX.

In LaTeX, you may reference equation 1 and 3 in a single +eqnarray+ environment separately. However, because of point (1), it would not be possible in pandoc-generated MathML. Besides, since they are not referenced with equation numbers in the MathML (point 3) in the first place.

This tool (command-line command) offers a way to fix these problems, albeit in a crude way. The command adds equation numbers that are guessed from the text in the annotation fields in ++ and LaTeX aux file (the latter of which is automatically generated as a byproduct when you compile a LaTeX document). Not all the numbers are recovered but only those that are referenced somewhere in the MathML file.

(Note that in principle, it should not be too difficult to modify the program so that all the labelled equations in LaTeX are labelled again in MathML. Nevertheless, it would be tricky to label equations that are not explicitly labelled in LaTeX because implicit numbering information is not available in the LaTeX aux file.)

The algorithm assumes a LaTeX standard aux file-format, the MathML having a link tag ++ with the attributes "data-reference-type=ref" and href to the label of the exact reference label in LaTeX (and the label should have no duplicates in the MathML) and also having the +'annotation[ encoding="application/x-tex"]'+ tag in each math tag containing the original LaTeX code. The LaTeX code must have either the standard "equation" or "eqnarray" structures associated with the standard "label" tag with a simple content (if it contains, apart from the label string, something more than preceding or trailing white spaces, such as a comment, this algorithm would likely fail). If equations in an eqnarray environment have complicated nested structures like a matrix, I do not know how the algorithm of this routine handles them. Also, the LaTeX section numbering must be combinations of Arabic numbers, full-stops, and maybe capital letters (for Appendix) only.

Essentially, LaTeX has a huge amount of freedom and so I am afraid it would be a somewhat futile effort to deal with every possibility...

=== Output MathML by pandoc-2.19 converted from LaTeX

Ordinary LaTeX inline maths expressions (e.g., +$5^2$+) are expressed as follows:

5π 5\pi

LaTeX's +begin{equation}+ is as follows (n.b., the +

+ tag may not be closed immediately after ++ but another ordinary sentences may follow):

x±ϵ x \pm \epsilon \label{my_xe}

LaTeX's +begin{eqnarray}+ is as follows:

1+x = 1x = 21x \begin{aligned} 1+x & = & 1-x \nonumber\\ & = & \frac{2}{1x} \label{eq_trivial} \end{aligned}

They are referred to as from another text follows:

Eq.[eq_trivial] was easy...

=== Algorithm

For fixing the alignments to follow the standard eqnarray alignments (right, centre, and left in this order), the program searches for ++ and rewrites the columnalign attributes in the ++ tags.

For fixing the equation numbers and links, the program

  • first reads a LaTeX aux file and lists all the labels for equations and their numbers.
  • Then, it picks up an internally-pointing HTML anchor,
  • matches it with the list generated from the LaTeX aux file and identifies the equation number,
  • searches labels in ++ tags for the identical string for the HTML/MathML-anchor,
  • identifies the exact equation corresponding to the label (if in the eqnarray environment),
  • inserts the identified equation number next to the MathML equation,
  • and finally modifies the plain text for the HTML anchor.

Each of the inserted equation number next to the corresponding equation is inside the ++ tags. In ++ (for LaTeX +\eqnarray{}+), it is inserted as a new ++ cell. In both cases, the text is right-aligned with some padding to the left. However, the position is relative to either the equation or the set of the equations that contains the relevant equation (for LaTeX +\eqnarray{}+) and is not like the original LaTeX, where equation numbers inside a pair of parentheses are always located at the right edge of a page in default.

== How to use the command

Once you have installed it according to the standard RubyGems procedure (see section Install), the main Ruby executable (command) pandoc_refeq_mathml should be in your command-search path.

It basically reads a MathML file from either the first command-line argument or STDIN and also a LaTeX aux file specified in a command-line, and then outputs the modified (corrected) MathML to STDOUT.

Any warnings are printed to either STDERR or a log-file specified in a command-line as an option.

Failure in matching the labels from an HTML tag with any of the MathML equations are printed as a warning (to STDERR in default). Although it may genuinely mean the non-existent labels in the original LaTeX source, it is far more likely that the labels belong to one of the sections (or tables of figures), because the algorithm cannot tell what the type (section, table, figure, or equation or else) of each label's origin is.

=== Help doc

The help doc for the command-line interface is displayed with +-h+ (or +--help+) option:

% pandoc_refeq_mathml -h Usage: pandoc_refeq_mathml [options] [--] [MathML.html] > STDOUT pandoc_refeq_mathml [options] [--] < STDIN > STDOUT

Description (Version=0.1): This fixes issues, label-references of equations and eqnarray alignments, of pandoc-converted MathML from LaTeX.

Specific options: -a, --aux [FILENAME] (mandatory) LaTeX aux filename --log [FILENAME] Log filename (Default: STDERR). /dev/null to disable it. --[no-]fixalign Fix eqnarray-alignment problems? (Def: true) -v, --[no-]verbose Run verbosely (Def: true)

Common options: -h, --help Show this message --version Show version

=== Examples

% pandoc_refeq_mathml --aux=mydoc.aux --log=error.log mydoc.html > revised1.html
% head -n 90 mydoc.html | pandoc_refeq_mathml --aux=mydoc.aux --no-fixalign > revised2.html

Also, in the +test/data/+ directory, there is a sample LaTeX file. You can run +make+ in the directory to generate and correct a HTML/MathML file. Read the comment in the +Makefile+ to see options, such as the LaTeX executable in your environment.

== Install

Standard Ruby-gem install procedure is suffice

% gem install pandoc_refeq_mathml

which should also install the dependant {Nokogiri gem}[https://rubygems.org/gems/nokogiri/].

Alternatively, it is possible to download the library file lib/pandoc_refeq_mathml.rb somewhere in your local directory, set the environmental variable RUBYLIB to also point to the directory for the library, and execute

% ruby bin/pandoc_refeq_mathml

where ruby is optional. Note that {Nokogiri gem}[https://rubygems.org/gems/nokogiri/] must be available in your RUBY library path.

In the developer's environment {diff-lcs gem}[https://rubygems.org/gems/diff-lcs] is also required.

This tool requires {Ruby}[http://www.ruby-lang.org] Version 2.0 or above.

== Developer's note

The source code is maintained also in {Github}[https://github.com/masasakano/pandoc_refeq_mathml] with no intuitive interface for annotations.

=== Tests

The Ruby codes under the directory test/ are the test scripts. You can run them from the top directory as ruby test/test_****.rb or simply run make test or rake test.

== Known bugs and ToDo items

  • pandoc-generated HTMLs do not contain Table/Figure numbers in their ++, even though each anchored text refers to the corresponding number, such as, +see Table "2"+, where "2" is the anchor.
  • In fact, pandoc-generated HTMLs do not generate ++ tags, let alone ++ for the LaTeX figure environments that contain more than one figure (with +\includegraphics+)...

== Copyright

Author:: Masa Sakano < info a_t wisebabel dot com > Versions:: The versions of this package follow Semantic Versioning (2.0.0) http://semver.org/ License:: MIT Warranty:: No warranty.

FAQs

Package last updated on 27 Aug 2022

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts