
Security News
GitHub Actions Pricing Whiplash: Self-Hosted Actions Billing Change Postponed
GitHub postponed a new billing model for self-hosted Actions after developer pushback, but moved forward with hosted runner price cuts on January 1.
talon
Advanced tools
Mailgun library to extract message quotations and signatures.
If you ever tried to parse message quotations or signatures you know that absence of any formatting standards in this area could make this task a nightmare. Hopefully this library will make your life much easier. The name of the project is inspired by TALON - multipurpose robot designed to perform missions ranging from reconnaissance to combat and operate in a number of hostile environments. That’s what a good quotations and signature parser should be like :smile:
Here’s how you initialize the library and extract a reply from a text message:
.. code:: python
import talon
from talon import quotations
talon.init()
text = """Reply
-----Original Message-----
Quote"""
reply = quotations.extract_from(text, 'text/plain')
reply = quotations.extract_from_plain(text)
# reply == "Reply"
To extract a reply from html:
.. code:: python
html = """Reply
<blockquote>
<div>
On 11-Apr-2011, at 6:54 PM, Bob <bob@example.com> wrote:
</div>
<div>
Quote
</div>
</blockquote>"""
reply = quotations.extract_from(html, 'text/html')
reply = quotations.extract_from_html(html)
# reply == "<html><body><p>Reply</p></body></html>"
Often the best way is the easiest one. Here’s how you can extract signature from email message without any machine learning fancy stuff:
.. code:: python
from talon.signature.bruteforce import extract_signature
message = """Wow. Awesome!
--
Bob Smith"""
text, signature = extract_signature(message)
# text == "Wow. Awesome!"
# signature == "--\nBob Smith"
Quick and works like a charm 90% of the time. For other 10% you can use the power of machine learning algorithms:
.. code:: python
import talon
# don't forget to init the library first
# it loads machine learning classifiers
talon.init()
from talon import signature
message = """Thanks Sasha, I can't go any higher and is why I limited it to the
homepage.
John Doe
via mobile"""
text, signature = signature.extract(message, sender='john.doe@example.com')
# text == "Thanks Sasha, I can't go any higher and is why I limited it to the\nhomepage."
# signature == "John Doe\nvia mobile"
For machine learning talon currently uses the scikit-learn_ library to build SVM
classifiers. The core of machine learning algorithm lays in
talon.signature.learning package. It defines a set of features to
apply to a message (featurespace.py), how data sets are built
(dataset.py), classifier’s interface (classifier.py).
Currently the data used for training is taken from our personal email
conversations and from ENRON_ dataset. As a result of applying our set
of features to the dataset we provide files classifier and
train.data that don’t have any personal information but could be
used to load trained classifier. Those files should be regenerated every
time the feature/data set is changed.
To regenerate the model files, you can run
.. code:: sh
python train.py
or
.. code:: python
from talon.signature import EXTRACTOR_FILENAME, EXTRACTOR_DATA
from talon.signature.learning.classifier import train, init
train(init(), EXTRACTOR_DATA, EXTRACTOR_FILENAME)
Recently we started a forge_ project to create an open-source, annotated dataset of raw emails. In the project we
used a subset of ENRON_ data, cleansed of private, health and financial information by EDRM_. At the moment over 190
emails are annotated. Any contribution and collaboration on the project are welcome. Once the dataset is ready we plan to
start using it for talon.
.. _scikit-learn: http://scikit-learn.org .. _ENRON: https://www.cs.cmu.edu/~enron/ .. _EDRM: http://www.edrm.net/resources/data-sets/edrm-enron-email-data-set .. _forge: https://github.com/mailgun/forge
The library is inspired by the following research papers and projects:
FAQs
Mailgun library to extract message quotations and signatures.
We found that talon demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
GitHub postponed a new billing model for self-hosted Actions after developer pushback, but moved forward with hosted runner price cuts on January 1.

Research
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.

Security News
Socket CTO Ahmad Nassri shares practical AI coding techniques, tools, and team workflows, plus what still feels noisy and why shipping remains human-led.