Research
Security News
Malicious npm Package Targets Solana Developers and Hijacks Funds
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).
ReadBeyond <http://www.readbeyond.it/>
__Alberto Pettarin <http://www.albertopettarin.it/>
__Home <http://www.readbeyond.it/aeneas/>
__ -
GitHub <https://github.com/readbeyond/aeneas/>
__ -
PyPI <https://pypi.python.org/pypi/aeneas/>
__ -
Docs <http://www.readbeyond.it/aeneas/docs/>
__ -
Tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>
__
Benchmark <https://readbeyond.github.io/aeneas-benchmark/>
__ -
Mailing List <https://groups.google.com/d/forum/aeneas-forced-alignment>
__ -
Web App <http://aeneasweb.org>
__aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.
For example, given this text file <https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.xhtml>
__
and this audio file <https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.mp3>
__,
aeneas determines, for each fragment, the corresponding time
interval in the audio file:
::
1 => [00:00:00.000, 00:00:02.640]
From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880]
That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240]
But as the riper should by time decease, => [00:00:09.240, 00:00:11.920]
His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280]
But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800]
Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]
Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680]
Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240]
Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400]
And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920]
Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640]
And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640]
Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080]
To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240]
.. figure:: wiki/align.png :alt: Waveform with aligned labels, detail
Waveform with aligned labels, detail
This synchronization map can be output to file in several formats, depending on its application:
System Requirements
1. a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU)
2. `Python <https://python.org/>`__ 2.7 (Linux, OS X, Windows) or 3.5 or
later (Linux, OS X)
3. `FFmpeg <https://www.ffmpeg.org/>`__
4. `eSpeak <http://espeak.sourceforge.net/>`__
5. Python packages ``BeautifulSoup4``, ``lxml``, and ``numpy``
6. Python headers to compile the Python C/C++ extensions (optional but
strongly recommended)
7. A shell supporting UTF-8 (optional but strongly recommended)
Supported Platforms
aeneas has been developed and tested on Debian 64bit, with
Python 2.7 and Python 3.5, which are the only supported
platforms at the moment. Nevertheless, aeneas has been confirmed
to work on other Linux distributions, Mac OS X, and Windows. See the
PLATFORMS file <https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md>
__
for details.
If installing aeneas natively on your OS proves difficult, you are
strongly encouraged to use
aeneas-vagrant <https://github.com/readbeyond/aeneas-vagrant>
, which
provides aeneas inside a virtualized Debian image running under
VirtualBox <https://www.virtualbox.org/>
and
Vagrant <http://www.vagrantup.com/>
__, which can be installed on any
modern OS (Linux, Mac OS X, Windows).
Installation
All-in-one installers are available for Mac OS X and Windows, and a Bash
script for deb-based Linux distributions (Debian, Ubuntu) is provided in
this repository. It is also possible to download a VirtualBox+Vagrant
virtual machine. Please see the `INSTALL
file <https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md>`__
for detailed, step-by-step installation procedures for different
operating systems.
The generic OS-independent procedure is simple:
1. **Install** `Python <https://python.org/>`__ (2.7.x preferred),
`FFmpeg <https://www.ffmpeg.org/>`__, and
`eSpeak <http://espeak.sourceforge.net/>`__
2. Make sure the following **executables** can be called from your
**shell**: ``espeak``, ``ffmpeg``, ``ffprobe``, ``pip``, and
``python``
3. First install ``numpy`` with ``pip`` and then ``aeneas`` (this order
is important):
.. code:: bash
pip install numpy
pip install aeneas
4. To **check** whether you installed **aeneas** correctly, run:
``bash python -m aeneas.diagnostics``
Usage
-----
1. Run without arguments to get the **usage message**:
.. code:: bash
python -m aeneas.tools.execute_task
python -m aeneas.tools.execute_job
You can also get a list of **live examples** that you can immediately
run on your machine thanks to the included files:
.. code:: bash
python -m aeneas.tools.execute_task --examples
python -m aeneas.tools.execute_task --examples-all
2. To **compute a synchronization map** ``map.json`` for a pair
(``audio.mp3``, ``text.txt`` in
`plain <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN>`__
text format), you can run:
.. code:: bash
python -m aeneas.tools.execute_task \
audio.mp3 \
text.txt \
"task_language=eng|os_task_file_format=json|is_text_type=plain" \
map.json
(The command has been split into lines with ``\`` for visual clarity; in
production you can have the entire command on a single line and/or you
can use shell variables.)
To **compute a synchronization map** ``map.smil`` for a pair
(``audio.mp3``,
`page.xhtml <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED>`__
containing fragments marked by ``id`` attributes like ``f001``), you can
run:
::
```bash
python -m aeneas.tools.execute_task \
audio.mp3 \
page.xhtml \
"task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
map.smil
```
As you can see, the third argument (the *configuration string*)
specifies the parameters controlling the I/O formats and the processing
options for the task. Consult the
`documentation <http://www.readbeyond.it/aeneas/docs/>`__ for details.
3. If you have several tasks to process, you can create a **job
container** to batch process them:
.. code:: bash
python -m aeneas.tools.execute_job job.zip output_directory
File ``job.zip`` should contain a ``config.txt`` or ``config.xml``
configuration file, providing **aeneas** with all the information needed
to parse the input assets and format the output sync map files. Consult
the `documentation <http://www.readbeyond.it/aeneas/docs/>`__ for
details.
The `documentation <http://www.readbeyond.it/aeneas/docs/>`__ contains a
highly suggested
`tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>`__
which explains how to use the built-in command line tools.
Documentation and Support
-------------------------
- Documentation: http://www.readbeyond.it/aeneas/docs/
- Command line tools tutorial:
http://www.readbeyond.it/aeneas/docs/clitutorial.html
- Library tutorial:
http://www.readbeyond.it/aeneas/docs/libtutorial.html
- Old, verbose tutorial: `A Practical Introduction To The aeneas
Package <http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html>`__
- Mailing list:
https://groups.google.com/d/forum/aeneas-forced-alignment
- Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html
- High level description of how aeneas works:
`HOWITWORKS <https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md>`__
- Development history:
`HISTORY <https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md>`__
- Testing:
`TESTING <https://github.com/readbeyond/aeneas/blob/master/wiki/TESTING.md>`__
- Benchmark suite: https://readbeyond.github.io/aeneas-benchmark/
Supported Features
------------------
- Input text files in ``parsed``, ``plain``, ``subtitles``, or
``unparsed`` (XML) format
- Multilevel input text files in ``mplain`` and ``munparsed`` (XML)
format
- Text extraction from XML (e.g., XHTML) files using ``id`` and
``class`` attributes
- Arbitrary text fragment granularity (single word, subphrase, phrase,
paragraph, etc.)
- Input audio file formats: all those readable by ``ffmpeg``
- Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB,
TEXTGRID, TSV, TTML, TXT, VTT, XML
- Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN,
DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA,
JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA,
SWE, TUR, UKR
- MFCC and DTW computed via Python C extensions to reduce the
processing time
- Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak
(default), eSpeak-ng, Festival, MacOS (via say), Nuance TTS API
- Default TTS (eSpeak) called via a Python C extension for fast audio
synthesis
- Possibility of running a custom, user-provided TTS engine Python
wrapper (e.g., included example for speect)
- Batch processing of multiple audio/text pairs
- Download audio from a YouTube video
- In multilevel mode, recursive alignment from paragraph to sentence to
word level
- In multilevel mode, MFCC resolution, MFCC masking, DTW margin, and
TTS engine can be specified for each level independently
- Robust against misspelled/mispronounced words, local rearrangements
of words, background noise/sporadic spikes
- Adjustable splitting times, including a max character/second
constraint for CC applications
- Automated detection of audio head/tail
- Output an HTML file for fine tuning the sync map manually
(``finetuneas`` project)
- Execution parameters tunable at runtime
- Code suitable for Web app deployment (e.g., on-demand cloud computing
instances)
- Extensive test suite including 1,200+ unit/integration/performance
tests, that run and must pass before each release
Limitations and Missing Features
--------------------------------
- Audio should match the text: large portions of spurious text or audio
might produce a wrong sync map
- Audio is assumed to be spoken: not suitable for song captioning, YMMV
for CC applications
- No protection against memory swapping: be sure your amount of RAM is
adequate for the maximum duration of a single audio file (e.g., 4 GB
RAM => max 2h audio; 16 GB RAM => max 10h audio)
- `Open issues <https://github.com/readbeyond/aeneas/issues>`__
A Note on Word-Level Alignment
A significant number of users runs aeneas to align audio and text at
word-level (i.e., each fragment is a word). Although aeneas was not
designed with word-level alignment in mind and the results might be
inferior to ASR-based forced aligners <https://github.com/pettarin/forced-alignment-tools>
__ for
languages with good ASR models, aeneas offers some options to
improve the quality of the alignment at word-level:
If you use the aeneas.tools.execute_task
command line tool, you can
add --presets-word
switch to enable MFCC nonspeech masking, for
example:
.. code:: bash
$ python -m aeneas.tools.execute_task --example-words --presets-word
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
If you use aeneas as a library, just set the appropriate
RuntimeConfiguration
parameters. Please see the command line tutorial <http://www.readbeyond.it/aeneas/docs/clitutorial.html>
__ for
details.
aeneas is released under the terms of the GNU Affero General Public
License Version 3. See the LICENSE file <https://github.com/readbeyond/aeneas/blob/master/LICENSE>
__ for
details.
Licenses for third party code and files included in aeneas can be
found in the
licenses <https://github.com/readbeyond/aeneas/blob/master/licenses/README.md>
__
directory.
No copy rights were harmed in the making of this project.
Sponsors
- **July 2015**: `Michele
Gianella <https://plus.google.com/+michelegianella/about>`__
generously supported the development of the boundary adjustment code
(v1.0.4)
- **August 2015**: `Michele
Gianella <https://plus.google.com/+michelegianella/about>`__
partially sponsored the port of the MFCC/DTW code to C (v1.1.0)
- **September 2015**: friends in West Africa partially sponsored the
development of the head/tail detection code (v1.2.0)
- **October 2015**: an anonymous donation sponsored the development of
the "YouTube downloader" option (v1.3.0)
- **April 2016**: the Fruch Foundation kindly sponsored the development
and documentation of v1.5.0
- **December 2016**: the `Centro Internazionale Del Libro Parlato
"Adriano Sernagiotto" <http://www.libroparlato.org/>`__ (Feltre,
Italy) partially sponsored the development of the v1.7 series
Supporting
Would you like supporting the development of aeneas?
I accept sponsorships to
Feel free to get in touch <mailto:aeneas@readbeyond.it>
__.
Contributing
If you think you found a bug or you have a feature request, please use
the `GitHub issue
tracker <https://github.com/readbeyond/aeneas/issues>`__ to submit it.
If you want to ask a question about using **aeneas**, your best option
consists in sending an email to the `mailing
list <https://groups.google.com/d/forum/aeneas-forced-alignment>`__.
Finally, code contributions are welcome! Please refer to the `Code
Contribution
Guide <https://github.com/readbeyond/aeneas/blob/master/wiki/CONTRIBUTING.md>`__
for details about the branch policies and the code style to follow.
Acknowledgments
---------------
Many thanks to **Nicola Montecchio**, who suggested using MFCCs and DTW,
and co-developed the first experimental code for aligning audio and
text.
**Paolo Bertasi**, who developed the APIs and Web application for
ReadBeyond Sync, helped shaping the structure of this package for its
asynchronous usage.
**Chris Hubbard** prepared the files for packaging aeneas as a
Debian/Ubuntu ``.deb``.
**Daniel Bair** prepared the ``brew`` formula for installing **aeneas**
and its dependencies on Mac OS X.
**Daniel Bair**, **Chris Hubbard**, and **Richard Margetts** packaged
the installers for Mac OS X and Windows.
**Firat Ozdemir** contributed the ``finetuneas`` HTML/JS code for fine
tuning sync maps in the browser.
**Willem van der Walt** contributed the code snippet to output a sync
map in TextGrid format.
**Chris Vaughn** contributed the MacOS TTS wrapper.
All the mighty `GitHub
contributors <https://github.com/readbeyond/aeneas/graphs/contributors>`__,
and the members of the `Google
Group <https://groups.google.com/d/forum/aeneas-forced-alignment>`__.
FAQs
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
We found that aeneas demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Security News
Research
Socket researchers have discovered malicious npm packages targeting crypto developers, stealing credentials and wallet data using spyware delivered through typosquats of popular cryptographic libraries.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.