Rustfst

Rust

Python

Rust implementation of Weighted Finite States Transducers.
Rustfst is a library for constructing, combining, optimizing, and searching weighted
finite-state transducers (FSTs). Weighted finite-state transducers are automata where
each transition has an input label, an output label, and a weight.
The more familiar finite-state acceptor is represented as a transducer
with each transition's input and output label equal. Finite-state acceptors
are used to represent sets of strings (specifically, regular or rational sets);
finite-state transducers are used to represent binary relations between pairs of
strings (specifically, rational transductions). The weights can be used to represent
the cost of taking a particular transition.
FSTs have key applications in speech recognition and synthesis, machine translation,
optical character recognition, pattern matching, string processing, machine learning,
information extraction and retrieval among others. Often a weighted transducer is used to
represent a probabilistic model (e.g., an n-gram model, pronunciation model). FSTs can be
optimized by determinization and minimization, models can be applied to hypothesis sets
(also represented as automata) or cascaded by finite-state composition, and the best
results can be selected by shortest-path algorithms.

Overview
For a basic example see the section below.
Some simple and commonly encountered types of FSTs can be easily
created with the macro [fst
] or the functions
acceptor
and
transducer
.
For more complex cases you will likely start with the
VectorFst
type, which will be imported
in the [prelude
] along with most everything else you need.
VectorFst<TropicalWeight>
corresponds
directly to the OpenFST StdVectorFst
, and can be used to load
its files using read
or
read_text
.
Because "iteration" over an FST can mean many different things,
there are a variety of different iterators. To iterate over state
IDs you may use
states_iter
, while to
iterate over transitions out of a state, you may use
get_trs
. Since it is common to
iterate over both, this can be done using
fst_iter
or
fst_into_iter
. It
is also very common to iterate over paths accepted by an FST,
which can be done with
paths_iter
, and as a convenience
for generating text,
string_paths_iter
.
Alternately, in the case of a linear FST, you may retrieve the
only possible path with
decode_linear_fst
.
Note that iterating over paths is not the same thing as finding
the shortest path or paths, which is done with
shortest_path
(for a single path)
or
shortest_path_with_config
(for N-shortest paths).
For the complete list of algorithms, see the [algorithms
] module.
You may now be wondering, especially if you have previously used
such linguist-friendly tools as
pyfoma, "what if I just want
to transduce some text???" The unfriendly answer is that
rustfst is a somewhat lower-level library, designed for
implementing things like speech recognizers. The somewhat more
helpful answer is that you would do this by constructing an
acceptor
for your input, which you will
compose
with a
transducer
, then
project
the result to itsoutput, and finally
iterate over the paths in
the resulting FST.
References
Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Michael Riley's work :
The API closely resembles that of OpenFST, with some
simplifications and changes to make it more idiomatic in Rust, notably
the use of Tr
instead of Arc
. See Differences fromOpenFST for more information.
Example
use anyhow::Result;
use rustfst::prelude::*;
use rustfst::algorithms::determinize::{DeterminizeType, determinize};
use rustfst::algorithms::rm_epsilon::rm_epsilon;
fn main() -> Result<()> {
let mut fst = VectorFst::<TropicalWeight>::new();
let s0 = fst.add_state();
let s1 = fst.add_state();
let s2 = fst.add_state();
fst.set_start(s0)?;
fst.add_tr(s0, Tr::new(3, 5, 10.0, s1))?;
fst.add_tr(s0, Tr::new(5, 7, 18.0, s2))?;
fst.set_final(s1, 31.0)?;
fst.set_final(s2, 45.0)?;
for p in fst.paths_iter() {
println!("{:?}", p);
}
connect(&mut fst)?;
minimize(&mut fst)?;
project(&mut fst, ProjectType::ProjectInput);
rm_epsilon(&mut fst)?;
fst = determinize(&fst)?;
Ok(())
}
Differences from OpenFST
Here is a non-exhaustive list of ways in which Rustfst's API
differs from OpenFST:
- The default epsilon symbol is
<eps>
and not <epsilon>
. - Functions and methods follow Rust naming conventions,
e.g.
add_state
rather than AddState
, but are otherwise mostly
equivalent, except that: - Transitions are called
Tr
and not Arc
, because Arc
has a
rather different and well-established meaning in Rust, and rustfst
uses it (std::sync::Arc
, that is) to reference-count symbol
tables. All associated functions also use tr
. - Final states are not indicated by a final weight of
zero
. You
can test for finality using is_final
, and
final_weight
returns an [Option
]. This
requires some care when converting OpenFST code. - Transitions can be accessed directly as a slice rather than requiring
an iterator.
- Semiring operations are expressed as plain old methods rather
than strange C++ things. So write
w1.plus(w2)
rather than
Plus(w1, w2)
, for instance. - Weights have in-place operations for ⊕
(
plus_assign
) and ⊗
(times_assign
). - Most of the type aliases (which would be trait aliases in Rust) such
as
StdArc
, StdFst
, and so forth, are missing, but type inference
allows us to avoid explicit type arguments in most cases, such as
when calling [Tr::new
], for instance. - State IDs are unsigned, with [
NO_STATE_ID
] used for a missing value.
They are also 32 bits by default (presumably, 4 billion states
is enough for most applications). This means you must take care to
cast them to [usize
] when using them as indices, and vice-versa,
preferably checking for overflows - Symbol IDs are also unsigned and 32-bits, with [
NO_LABEL
] used
for a missing value. - Floating-point weights are not generic, so are always single-precision.
Benchmark with OpenFST
I did a benchmark some time ago on almost every linear fst algorithm and compared the results with OpenFst
. You can find the results here :
Spoiler alert: Rustfst
is faster on all those algorithms 😅
Documentation
The documentation of the last released version is available here :
https://docs.rs/rustfst
Release process
- Use the script
update_version.sh
to update the version of every package. - Push
- Push a new tag with the prefix
rustfst-v
Example :
./update_version.sh 0.9.1-alpha.6
git commit -am "Release 0.9.1-alpha.6"
git push
git tag -a rustfst-v0.9.1-alpha.6 -m "Release rustfst 0.9.1-alpha.6"
git push --tags
Optionally, if this is a major release, create a GitHub release in the UI.
Projects contained in this repository
This repository contains two main projects:
rustfst
is the Rust re-implementation.
- Crate available on crates.io here
- Documentation available on docs.rs here
rustfst-python
is the python binding of rustfst
.
- Package available on Pypi here
- Documentation available on Github Pages here
License
Licensed under either of
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.