unoserver
Using LibreOffice as a server for converting documents.
Overview
Using LibreOffice to convert documents is easy, you can use a command like this to
convert a file to PDF, for example::
$ libreoffice --headless --convert-to pdf ~/Documents/MyDocument.odf
However, that will load LibreOffice into memory, convert a file and then exit LibreOffice,
which means that the next time you convert a document LibreOffice needs to be loaded into
memory again.
To avoid that, LibreOffice has a listener mode, where it can listen for commands via a port,
and load and convert documents without exiting and reloading the software. This lowers the
CPU load when converting many documents with somewhere between 50% and 75%, meaning you can
convert somewhere between two and four times as many documents in the same time using a listener.
Unoserver contains three commands to help you do this, unoserver
which starts a listener on the
specified IP interface and port, and unoconverter
which will connect to a listener and ask it
to convert a document, as well as unocompare
which will connect to a listener and ask it
to compare two documents and convert the result document.
Installation
NB! Windows and Mac support is as of yet untested.
Unoserver needs to be installed by and run with the same Python installation that LibreOffice uses.
On Unix this usually means you can just install it with::
$ sudo pip install unoserver
If you have multiple versions of LibreOffice installed, you need to install it for each one.
Usually each LibreOffice install will have it's own python
executable and you need to run
pip
with that executable::
$ sudo /full/path/to/python -m pip install unoserver
To find all Python installations that have the relevant LibreOffice libraries installed,
you can run a script called find_uno.py
::
wget -O find_uno.py https://gist.githubusercontent.com/regebro/036da022dc7d5241a0ee97efdf1458eb/raw/find_uno.py
python3 find_uno.py
This should give an output similar to this::
Trying python found at /usr/bin/python3... Success!
Trying python found at /opt/libreoffice7.1/program/python... Success!
Found 2 Pythons with Libreoffice libraries:
/usr/bin/python3
/opt/libreoffice7.1/program/python
The /usr/bin/python3
binary will be the system Python used for versions of
Libreoffice installed by the system package manager. The Pythons installed
under /opt/
will be Python versions that come with official LibreOffice
distributions.
To install on such distributions, do the following::
$ wget https://bootstrap.pypa.io/get-pip.py
$ sudo /path/to/python get-pip.py
$ sudo /path/to/python -m pip install unoserver
You can also install it in a virtualenv, if you are using the system Python
for that virtualenv, and specify the --system-site-packages
parameter::
$ virtualenv --python=/usr/bin/python3 --system-site-packages virtenv
$ virtenv/bin/pip install unoserver
Windows and Mac installs aren't officially supported yet, but on Windows the
paths to the LibreOffice Python executable are usually in locations such as
C:\\Program Files (x86)\\LibreOffice\\python.exe
. On Mac it can be for
example /Applications/LibreOffice.app/Contents/python
or
/Applications/LibreOffice.app/Contents/Resources/python
.
Usage
Installing unoserver installs three scripts, unoserver
, unoconverter
and unocompare
.
All can also be run as modules with python3 -m unoserver.server
, python3 -m unoserver.converter
and python3 -m unoserver.comparer
with the same arguments as the main scripts.
Unoserver
.. code::
unoserver [-h] [--interface INTERFACE] [--uno-interface UNO_INTERFACE] [--port PORT] [--uno-port UNO_PORT]
[--daemon] [--executable EXECUTABLE] [--user-installation USER_INSTALLATION]
[--libreoffice-pid-file LIBREOFFICE_PID_FILE]
* `--interface`: The interface used by the XMLRPC server, defaults to "127.0.0.1"
* `--port`: The port used by the XMLRPC server, defaults to "2003"
* `--uno-interface`: The interface used by the LibreOffice server, defaults to "127.0.0.1"
* `--uno-port`: The port used by the LibreOffice server, defaults to "2002"
* `--daemon`: Deamonize the server
* `--executable`: The path to the LibreOffice executable
* `--user-installation`: The path to the LibreOffice user profile, defaults to a dynamically created temporary directory
* `--libreoffice-pid-file`: If set, unoserver will write the Libreoffice PID to this file.
If started in daemon mode, the file will not be deleted when unoserver exits.
Unoconvert
.. code::
unoconvert [-h] [--convert-to CONVERT_TO] [--filter FILTER] [--filter-options FILTER_OPTIONS]
[--update-index] [--dont-update-index] [--host HOST] [--port PORT]
[--host-location {auto,remote,local}] infile outfile
infile
: The path to the file to be converted (use - for stdin)outfile
: The path to the converted file (use - for stdout)--convert-to
: The file type/extension of the output file (ex pdf). Required when using stdout--filter
: The export filter to use when converting. It is selected automatically if not specified.--filter-option
: Pass an option for the export filter, in name=value format. Use true/false for boolean values. Can be repeated for multiple options.--filter-options
: Alias for --filter-option
.--host
: The host used by the server, defaults to "127.0.0.1"--port
: The port used by the server, defaults to "2002"--host-location
: The host location determines the handling of files. If you run the client on the
same machine as the server, it can be set to local, and the files are sent as paths. If they are
different machines, it is remote and the files are sent as binary data. Default is auto, and it will
send the file as a path if the host is 127.0.0.1 or localhost, and binary data for other hosts.
Example for setting PNG width/height::
unoconvert infile.odt outfile.png --filter-options PixelWidth=640 --filter-options PixelHeight=480
Unocompare
.. code::
unocompare [-h] [--file-type FILE_TYPE] [--host HOST] [--port PORT] [--host-location {auto,remote,local}]
oldfile newfile outfile
* `oldfile`: The path to the older file to be compared with the original one (use - for stdin)
* `newfile`: The path to the newer file to be compared with the modified one (use - for stdin)
* `outfile`: The path to the result of the comparison and converted file (use - for stdout)
* `--file-type`: The file type/extension of the result output file (ex pdf). Required when using stdout
* `--host`: The host used by the server, defaults to "127.0.0.1"
* `--port`: The port used by the server, defaults to "2002"
* `--host-location`: The host location determines the handling of files. If you run the client on the
same machine as the server, it can be set to local, and the files are sent as paths. If they are
different machines, it is remote and the files are sent as binary data. Default is auto, and it will
send the file as a path if the host is 127.0.0.1 or localhost, and binary data for other hosts.
Development and Testing
-----------------------
1. Clone the repo from `https://github.com/unoconv/unoserver`.
2. Setup a virtualenv::
$ virtualenv --system-site-packages ve
$ ve/bin/pip install -e .[devenv]
3. Run tests::
$ ve/bin/pytest tests
4. Run `flake8` linting:
$ ve/bin/flake8 src tests
Comparison with `unoconv`
-------------------------
Unoserver started as a rewrite, and hopefully a replacement to `unoconv`, a module with support
for using LibreOffice as a listener to convert documents.
Differences for the user
-
Easier install for system versions of LibreOffice. On Linux, the packaged versions of LibreOffice
typically uses the system Python, making it easy to install unoserver
with a simple
sudo pip install unoserver
command.
-
Separate commands for server and client. The client no longer tries to start a listener and then
close it after conversion if it can't find a listener. Instead the new unoconverter
client
requires the unoserver
to be started. This makes it less practical for one-off converts,
but as mentioned that can easily be done with LibreOffice itself.
-
The unoserver
listener does not prevent you from using LibreOffice as a normal user, while the
unoconv
listener would block you from starting LibreOffice to open a document normally.
-
You should be able to on a multi-core machine run several unoservers
with different ports.
There is however no support for any form of load balancing in unoserver
, you would have to
implement that yourself in your usage of unoconverter
.
-
Only LibreOffice is officially supported. Other variations are untested.
Differences for the maintainer
* It's a complete and clean rewrite, supporting only Python 3, with easier to understand and
therefore easier to maintain code, hopefully meaning more people can contribute.
* It doesn't rely on internal mappings of file types and export filters, but asks LibreOffice
for this information, which will increase compatibility with different LibreOffice versions,
and also lowers maintenance.
Contributors
------------
* Lennart Regebro, regebro@gmail.com
* Stephan Richter, srichter@shoobx.com
* Bruno Simão, https://github.com/ankology
* Åsmund Stavdahl, https://github.com/asmundstavdahl
* Balázs Varga, https://github.com/bvarga91
* Alessandro Filippini, https://github.com/AlePini
* Dmitry Shachnev, https://github.com/mitya57
* Socheat Sok, https://github.com/socheatsok78
* BohwaZ, https://github.com/bohwaz
2.1 (unreleased)
----------------
- Nothing changed yet.
2.1b1 (2024-01-12)
------------------
- Add a --input-filter argument to specify a different file type than the
one LibreOffice will guess.
- For consistency renamed --filter to --output-filter, but the --filter
will remain for backwards compatibility.
- If you specify a non-existent filter, the list of filters is now alphabetical.
- You can now use both the LibreOffice name, but also internal shorter names
and sometimes even file suffices to specify the filter.
2.0.1 (2024-01-12)
------------------
- Specifying `--host-location=remote` didn't work for the outfile if you
used port forwarding from localhost.
- Always default the uno interface to 127.0.0.1, no matter what the XMLRPC
interface is.
2.0 (2023-10-19)
----------------
- Made the --daemon parameter work again
- Added a --filter-option alias for --filter-options
2.0b1 (2023-08-18)
------------------
- A large refactoring with an XML-RPC server and a new client using that XML-RPC
server for communicating. This means the client can now be lightweight, and
no longer needs the Uno library, or even LibreOffice installed. Instead the
new `unoserver.client.UnoClient()` can be used as a library from Python.
- A cleanup and refactor of the commands, with new, more gooder parameter names.
1.6 (2023-08-18)
----------------
- Added some deprecation warnings for command arguments as they will change in 2.0.
1.5 (2023-08-11)
----------------
- Added support for passing in filter options with the --filter-options parameter.
- Add `--user-installation` flag to `unoserver` for custom user installations.
- Add a `--libreoffice-pid-file` argument for `unoserver` to save the LibreOffice PID.
1.4 (2023-04-28)
----------------
- Added new feature: comparing documents and export the result to any format.
- You can run the new module as scripts, and also with ``python3 -m unoserver.comparer`` just
like the ``python3 -m unoserver.server`` and ``python3 -m unoserver.converter``.
- Porting feature from previous release: refresh of index in the Table of Contents
1.3 (2023-02-03)
----------------
- Now works on Windows (although it's not officially supported).
- Added --filter argument to unoconverter to allow explicit selection of which
export filter to use for conversion.
1.2 (2022-03-17)
----------------
- Move logging configuration from import time to the main() functions.
- Improved the handling of KeyboardInterrupt
- Added the deprecated but still necessary com.sun.star.text.WebDocument
for HTML docs.
1.1 (2021-10-14)
----------------
- Fixed a bug: If you specified an unknown file extension while piping the
result to stdout, you would get a type error instead of the correct error.
- Added an extra check that libreoffice is quite dead when exiting,
I experienced a few cases where soffice.bin was using 100% load in the
background after unoserver exited. I hope this takes care of that.
- Added ``if __name__ == "main":`` blocks so you can run the modules
as scripts, and also with ``python3 -m unoserver.server`` and
``python3 -m unoserver.converter``.
1.0.1 (2021-09-20)
------------------
- Fixed a bug that meant `unoserver` did not behave well with Supervisord's restart command.
1.0 (2021-08-10)
----------------
- A few small spelling and grammar changes.
1.0b3 (2021-07-01)
------------------
- Make sure `interface` and `port` options are honored.
- Added an --executable option to the server to pick a specific libreoffice installation.
- Changed the infile and outfile options to be positional.
- Added support for using stdin and stdout.
- Added a --convert-to argument to specify the resulting filetype.
1.0b2 (2021-06-24)
------------------
- A bug prevented converting to or from files in the local directory.
1.0b1 (2021-06-24)
------------------
- First beta release
0.0.1 (2021-06-16)
------------------
- First alpha release