
Security News
The Nightmare Before Deployment
Season’s greetings from Socket, and here’s to a calm end of year: clean dependencies, boring pipelines, no surprises.
sonnia
Advanced tools
SoNNia is a Python 3 software developed to infer selection pressures on features of amino acid CDR3 sequences. SoNNia takes as input TCR CDR3 amino acid sequences with V and J genes. Its output is sequence-level selection factors which indicate how more or less represented this sequence would be in the selected pool as compared to the pre-selected pool. These in turn could be used to calculate the probability of observing any sequence after selection and sample from the selected repertoire.
SoNNia is a Python software package that extends the functionality of SONIA. It expands the range of selection models that can be inferred, including both non-linear single-chain models and linear/non-linear paired-chain models. SoNNia takes as input CDR3 amino acid sequences, with (or without) V and J genes assignment. Its output is selection factors that can be used to calculate the probability of observing any sequence after selection. The inference is based on maximizing the likelihood of observing a selected data sample given a representative pre-selected sample. This method was first used in Elhanati et al (2014) to study thymic selection. Generally, the pre-selected sample can be generated internally using the OLGA software, but SoNNia also allows it to be supplied externally, in the same way the data sample is provided.
The package provides both command line tools and a Python API for easy integration into analysis pipelines. It ships with several pre-trained models for common use cases like human and mouse TCR and BCR chains. Custom models can also be trained on new datasets.

Extensive documentation can be found here.
SoNNia is a python software. It is available on PyPI and can be downloaded and installed through pip:
pip install sonnia
For mac user on new metal devices, make sure to install additional dependencies (i.e. tensorflow-metal) to make tensorflow work with the GPU. CPU version is also available and, given the small size of the models, it should be sufficient for most use cases. SoNNia is also available on GitHub. The command line entry points can be installed by using the setup.py script:
pip install .
Isacchini G, Walczak AM, Mora T, Nourmohammad A, Deep generative selection models of T and B cell receptor repertoires with soNNia, (2021) PNAS, https://www.pnas.org/content/118/14/e2023141118.short
Using neural networks on small datasets risks overfitting due to the large number of parameters that need to be learned. The neural network may learn noise patterns in the training data rather than true underlying relationships. This is why we recommend using the simpler linear SONIA model for datasets with fewer than 100,000 clones, as it has fewer parameters and is less prone to overfitting. The non-linear SoNNia models are better suited for larger datasets where there is enough data to reliably learn complex patterns.
This code is quite flexible, however it does demand a very consistent definition of CDR3 (junction) sequences.
CHECK THE DEFINITION OF THE CDR3 REGION OF THE SEQUENCES YOU INPUT. This will likely be the most often problem that occurs.
The default models/genomic data are set up to define the CDR3 region (i.e. the junction) from the conserved cysteine C (INCLUSIVE) in the V region to the conserved F or W (INCLUSIVE) in the J.
There are three command line console scripts (the scripts can still be called as executables if SoNNia is not installed):
sonnia evaluatesonnia generatesonnia inferFor any of them you can execute with the -h or --help flags to get the options.
We offer a quick demonstration of the console scripts. This will show how to generate and evaluate sequences and infer a selection model using the default generation model for human TCR beta chains that ships with the SONIA software. In order to run the commands below you need to download the examples folder.
$ sonnia infer --model humanTRB -i examples/data_seqs.csv.gz$ sonnia generate --model examples/sonnia_model --post -n 100$ sonnia evaluate --model examples/sonnia_model -i examples/data_seqs.csv.gz --ppost | Model Type | Description | Chain Type |
|---|---|---|
| humanTRA | Human T cell alpha | VJ |
| humanTRB | Human T cell beta | VDJ |
| humanIGH | Human B cell heavy | VDJ |
| humanIGK | Human B cell kappa | VJ |
| humanIGL | Human B cell lambda | VJ |
| mouseTRB | Mouse T cell beta | VDJ |
| mouseTRA | Mouse T cell alpha | VJ |
| mouseIGH | Mouse B cell heavy | VDJ |
| mouseIGK | Mouse B cell kappa | VJ |
| mouseIGL | Mouse B cell lambda | VJ |
In order to incorporate the core algorithm into an analysis pipeline (or to write your own script wrappers) all that is needed is to import the modules. Each module defines some classes that only a few methods get called on.
The modules are:
| Module name | Classes |
|---|---|
| sonia_paired.py | SoniaPaired |
| sonnia_paired.py | SoNNiaPaired |
| sonnia.py | SoNNia |
| sonia.py | Sonia |
| utils.py | N/A (contains util functions) |
| classifiers.py | Linear, SoniaRatio |
The classes SoniaPaired, SoNNiaPaired, and SoNNia have similar behaviour to the ones defined in the SONIA package. As an example, the basic import and initialization of the single-chain SoniaLeftposRightpos model
# linear sonia model
from sonia.sonia_leftpos_rightpos import SoniaLeftposRightpos
qm=SoniaLeftposRightpos()
# deep sonia model
from sonnia.sonnia import SoNNia
qm=SoNNia()
# linear paired-chain model (i.e. alpha-beta for TCRs)
from sonnia.sonia_paired import SoniaPaired
qm=SoniaPaired()
# deep paired-chain model (i.e. alpha-beta for TCRs)
from sonnia.sonnia_paired import SoNNiaPaired
qm=SoNNiaPaired()
# linear single-chain model (sonia equivalent)
from sonnia.sonia import Sonia
qm=Sonia()
In the examples folder there is a python notebook (or alternatively the example_pipeline script) which shows the main properties of the software, including:
The fig2_paper folder contains all scripts and explanations needed to reproduce figure 2 of the soNNia paper, which demonstrates:
The fig4_paper folder contains all scripts and explanations needed to reproduce figure 4 of the soNNia paper, which demonstrates:
Any issues or questions should be addressed to us.
Free use of soNNia is granted under the terms of the GNU General Public License version 3 (GPLv3).
FAQs
SoNNia is a Python 3 software developed to infer selection pressures on features of amino acid CDR3 sequences. SoNNia takes as input TCR CDR3 amino acid sequences with V and J genes. Its output is sequence-level selection factors which indicate how more or less represented this sequence would be in the selected pool as compared to the pre-selected pool. These in turn could be used to calculate the probability of observing any sequence after selection and sample from the selected repertoire.
We found that sonnia demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Season’s greetings from Socket, and here’s to a calm end of year: clean dependencies, boring pipelines, no surprises.

Research
/Security News
Impostor NuGet package Tracer.Fody.NLog typosquats Tracer.Fody and its author, using homoglyph tricks, and exfiltrates Stratis wallet JSON/passwords to a Russian IP address.

Security News
Deno 2.6 introduces deno audit with a new --socket flag that plugs directly into Socket to bring supply chain security checks into the Deno CLI.