
Security News
Security Community Slams MIT-linked Report Claiming AI Powers 80% of Ransomware
Experts push back on new claims about AI-driven ransomware, warning that hype and sponsored research are distorting how the threat is understood.
An automatic lyrics transcription (ALT) evaluation toolkit, released with the Jam-ALT benchmark.
The package implements metrics designed to work well with lyrics formatted according to music industry standards (see the Jam-ALT annotation guide), namely:
Under the hood, the text is pre-processed using the sacremoses tokenizer and punctuation normalizer.
Note that apostrophes and single quotes are never treated as quotation marks, but as part of a word, marking an elision or a contraction.
Install the package with pip install alt-eval.
To compute the metrics:
from alt_eval import compute_metrics
compute_metrics(references, hypotheses)
where references and hypotheses are lists of strings. To specify the language (English by default), use the languages parameter, passing either a single language code, or a list of language codes corresponding to individual examples.
For Jam-ALT, use:
from datasets import load_dataset
dataset = load_dataset("audioshake/jam-alt")["test"]
compute_metrics(dataset["text"], transcriptions, languages=dataset["language"])
If you are only interested in WER, you may skip formatting- and punctuation-related metrics by passing include_other=False.
Use visualize_errors=True to also get a list of HTML snippets that can be used to visualize the errors in each transcript.
The package implements language-specific tokenization via sacremoses, enhanced with custom rules. Support is well tested for English, Spanish, German, and French.
For writing systems that do not use spaces to separate words (Chinese, Japanese, Thai, Lao, Burmese, …), each character is considered as a separate word, as per Radford et al. (2022), making the WER equivalent to CER (character error rate).
See the test cases for examples of how different languages are tokenized. Contributions adding support for additional languages are welcome.
The Jam-ALT annotation guide forbids certain end-of-line punctuation and requires the first letter of each line to be uppercase.
For transcription systems that do not respect these rules, the results on Jam-ALT can be improved by normalizing the transcripts using the normalize_lyrics() function, which fixes these specific issues.
Note, however, that this relies on the line break predictions being correct. Moreover, other datasets may follow different rules.
For these reasons, this normalization is not included as a fixed pre-processing step in compute_metrics(), and instead made optional.
FAQs
Automatic lyrics transcription evaluation toolkit
We found that alt-eval demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Experts push back on new claims about AI-driven ransomware, warning that hype and sponsored research are distorting how the threat is understood.

Security News
Ruby's creator Matz assumes control of RubyGems and Bundler repositories while former maintainers agree to step back and transfer all rights to end the dispute.

Research
/Security News
Socket researchers found 10 typosquatted npm packages that auto-run on install, show fake CAPTCHAs, fingerprint by IP, and deploy a credential stealer.