
Product
Introducing Webhook Events for Alert Changes
Add real-time Socket webhook events to your workflows to automatically receive software supply chain alert changes in real time.
arxiv-latex-cleaner
Advanced tools
arxiv_latex_cleanerThis tool allows you to easily clean the LaTeX code of your paper to submit to
arXiv. From a folder containing all your code, e.g. /path/to/latex/, it
creates a new folder /path/to/latex_arXiv/, that is ready to ZIP and upload to
arXiv.
arxiv_latex_cleaner /path/to/latex --resize_images --im_size 500 --images_allowlist='{"images/im.png":2000}'
Or simply from a config file
arxiv_latex_cleaner /path/to/latex --config cleaner_config.yaml
pip install arxiv-latex-cleaner
| :exclamation: arxiv_latex_cleaner is only compatible with Python >=3.9 :exclamation: |
|---|
If using MacOS, you can install using Homebrew:
brew install arxiv_latex_cleaner
Alternatively, you can download the source code:
git clone https://github.com/google-research/arxiv-latex-cleaner
cd arxiv-latex-cleaner/
python -m arxiv_latex_cleaner --help
And install as a command-line program directly from the source code:
python setup.py install
.aux, .log, .out, etc.).\begin{comment}\end{comment},
\iffalse\fi, and \if0\fi environments.commands_to_delete
(such as \todo{} that you redefine as the empty string at the end).cleaner_config.yaml file.There is a 50MB limit on arXiv submissions, so to make it fit:
.tex files (those that are not in the root and not
included in any other .tex file)..tex file).im_size pixels, to reduce the size of the
submission. You can allowlist some images to skip the global size using
images_allowlist..pdf files using ghostscript (Linux and Mac only).
You can allowlist some PDFs to skip the global size using
images_allowlist.To prevent the upload of tikzpicture source code or raw simulation data, this feature:
\begin{tikzpicture} ... \end{tikzpicture} with the respective
\includegraphics{EXTERNAL_TIKZ_FOLDER/picture_name.pdf}..pdf files in folder
EXTERNAL_TIKZ_FOLDER. See section 52 (Externalization Library) in the
PGF/TikZ manual on TikZ picture
externalization.\tikzsetnextfilename{picture_name} command (as in
\tikzsetnextfilename{picture_name}\begin{tikzpicture} ... \end{tikzpicture}) where the externalized picture_name.pdf filename
matches picture_name.Sometimes it is useful to work with a set of custom LaTeX commands when writing a paper. To get rid of them upon arXiv submission, one can simply revert them to plain LaTeX with a regular expression insertion.
{
"pattern" : '(?:\\figcomp{\s*)(?P<first>.*?)\s*}\s*{\s*(?P<second>.*?)\s*}\s*{\s*(?P<third>.*?)\s*}',
"insertion" : '\parbox[c]{{ {second} \linewidth}} {{ \includegraphics[width= {third} \linewidth]{{figures/{first} }} }}',
"description" : "Replace figcomp"
}
The pattern above will find all \figcomp{path}{w1}{w2} commands and replace
them with
\parbox[c]{w1\linewidth}{\includegraphics[width=w2\linewidth]{figures/path}}.
Note that the insertion template is filled with the
named groups captures
from the pattern. Note that the replacement is processed before all
\includegraphics commands are processed and corresponding file paths are
copied, making sure all figure files are copied to the cleaned version. See also
cleaner_config.yaml for details on how to specify the
patterns.
usage: arxiv_latex_cleaner@v1.0.8 [-h] [--resize_images] [--im_size IM_SIZE]
[--compress_pdf]
[--pdf_im_resolution PDF_IM_RESOLUTION]
[--images_allowlist IMAGES_ALLOWLIST]
[--keep_bib]
[--commands_to_delete COMMANDS_TO_DELETE [COMMANDS_TO_DELETE ...]]
[--commands_only_to_delete COMMANDS_ONLY_TO_DELETE [COMMANDS_ONLY_TO_DELETE ...]]
[--environments_to_delete ENVIRONMENTS_TO_DELETE [ENVIRONMENTS_TO_DELETE ...]]
[--if_exceptions IF_EXCEPTIONS [IF_EXCEPTIONS ...]]
[--use_external_tikz USE_EXTERNAL_TIKZ]
[--svg_inkscape [SVG_INKSCAPE]]
[--config CONFIG] [--verbose]
input_folder
Clean the LaTeX code of your paper to submit to arXiv. Check the README for
more information on the use.
positional arguments:
input_folder Input folder containing the LaTeX code.
optional arguments:
-h, --help show this help message and exit
--resize_images Resize images.
--im_size IM_SIZE Size of the output images (in pixels, longest side).
Fine tune this to get as close to 10MB as possible.
--compress_pdf Compress PDF images using ghostscript (Linux and Mac
only).
--pdf_im_resolution PDF_IM_RESOLUTION
Resolution (in dpi) to which the tool resamples the
PDF images.
--images_allowlist IMAGES_ALLOWLIST
Images (and PDFs) that won't be resized to the default
resolution, but the one provided here. Value is pixel
for images, and dpi forPDFs, as in --im_size and
--pdf_im_resolution, respectively. Format is a
dictionary as: '{"path/to/im.jpg": 1000}'
--keep_bib Avoid deleting the *.bib files.
--commands_to_delete COMMANDS_TO_DELETE [COMMANDS_TO_DELETE ...]
LaTeX commands that will be deleted. Useful for e.g.
user-defined \todo commands. For example, to delete
all occurrences of \todo1{} and \todo2{}, run the tool
with `--commands_to_delete todo1 todo2`.Please note
that the positional argument `input_folder` cannot
come immediately after `commands_to_delete`, as the
parser does not have any way to know if it's another
command to delete.
--commands_only_to_delete COMMANDS_ONLY_TO_DELETE [COMMANDS_ONLY_TO_DELETE ...]
LaTeX commands that will be deleted but the text
wrapped in the commands will be retained. Useful for
commands that change text formats and colors, which
you may want to remove but keep the text within. Usages
are exactly the same as commands_to_delete. Note that if
the commands listed here duplicate that after
commands_to_delete, the default action will be retaining
the wrapped text.
--environments_to_delete ENVIRONMENTS_TO_DELETE [ENVIRONMENTS_TO_DELETE ...]
LaTeX environments that will be deleted. Useful for e.g.
user-defined comment environments. For example, to
delete all occurrences of \begin{note} ... \end{note},
run the tool with `--environments_to_delete note`.
Please note that the positional argument `input_folder`
cannot come immediately after
`environments_to_delete`, as the parser does not have
any way to know if it's another environment to delete.
--if_exceptions IF_EXCEPTIONS [IF_EXCEPTIONS ...]
Constant TeX primitive conditionals (\iffalse, \iftrue,
etc.) are simplified, i.e., true branches are kept, false
branches deleted. To parse the conditional constructs
correctly, all commands starting with `\if` are assumed to
be TeX primitive conditionals (e.g., declared by
\newif\ifvar). Some known exceptions to this rule are
already included (e.g., \iff, \ifthenelse, etc.), but you
can add custom exceptions using `--if_exceptions iffalt`.
--use_external_tikz USE_EXTERNAL_TIKZ
Folder (relative to input folder) containing
externalized tikz figures in PDF format.
--svg_inkscape [SVG_INKSCAPE]
Include PDF files generated by Inkscape via the
`\includesvg` command from the `svg` package. This is
done by replacing the `\includesvg` calls with
`\includeinkscape` calls pointing to the generated
`.pdf_tex` files. By default, these files and the
generated PDFs are located under `./svg-inkscape`
(relative to the input folder), but a different path
(relative to the input folder) can be provided in case a
different `inkscapepath` was set when loading the `svg`
package.
--config CONFIG Read settings from `.yaml` config file. If command
line arguments are provided additionally, the config
file parameters are updated with the command line
parameters.
--verbose Enable detailed output.
python -m unittest arxiv_latex_cleaner.tests.arxiv_latex_cleaner_test
This is not an officially supported Google product.
FAQs
Cleans the LaTeX code of your paper to submit to arXiv.
We found that arxiv-latex-cleaner demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Product
Add real-time Socket webhook events to your workflows to automatically receive software supply chain alert changes in real time.

Security News
ENISA has become a CVE Program Root, giving the EU a central authority for coordinating vulnerability reporting, disclosure, and cross-border response.

Product
Socket now scans OpenVSX extensions, giving teams early detection of risky behaviors, hidden capabilities, and supply chain threats in developer tools.