Socket
Book a DemoInstallSign in
Socket

sonnia

Package Overview
Dependencies
Maintainers
1
Versions
24
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

sonnia

SoNNia is a Python 3 software developed to infer selection pressures on features of amino acid CDR3 sequences. SoNNia takes as input TCR CDR3 amino acid sequences with V and J genes. Its output is sequence-level selection factors which indicate how more or less represented this sequence would be in the selected pool as compared to the pre-selected pool. These in turn could be used to calculate the probability of observing any sequence after selection and sample from the selected repertoire.

pipPyPI
Version
0.4.0
Maintainers
1

SoNNia

SoNNia is a Python software package that extends the functionality of SONIA. It expands the range of selection models that can be inferred, including both non-linear single-chain models and linear/non-linear paired-chain models. SoNNia takes as input CDR3 amino acid sequences, with (or without) V and J genes assignment. Its output is selection factors that can be used to calculate the probability of observing any sequence after selection. The inference is based on maximizing the likelihood of observing a selected data sample given a representative pre-selected sample. This method was first used in Elhanati et al (2014) to study thymic selection. Generally, the pre-selected sample can be generated internally using the OLGA software, but SoNNia also allows it to be supplied externally, in the same way the data sample is provided.

The package provides both command line tools and a Python API for easy integration into analysis pipelines. It ships with several pre-trained models for common use cases like human and mouse TCR and BCR chains. Custom models can also be trained on new datasets.

image

Documentation

Extensive documentation can be found here.

Installation

SoNNia is a python software. It is available on PyPI and can be downloaded and installed through pip:

pip install sonnia

For mac user on new metal devices, make sure to install additional dependencies (i.e. tensorflow-metal) to make tensorflow work with the GPU. CPU version is also available and, given the small size of the models, it should be sufficient for most use cases. SoNNia is also available on GitHub. The command line entry points can be installed by using the setup.py script:

pip install .

References

Isacchini G, Walczak AM, Mora T, Nourmohammad A, Deep generative selection models of T and B cell receptor repertoires with soNNia, (2021) PNAS, https://www.pnas.org/content/118/14/e2023141118.short

Important Notes

Dataset size requirements

  • For datasets < 100,000 clones: Use linear SONIA model
  • For larger datasets: Can use non-linear SoNNia models

Using neural networks on small datasets risks overfitting due to the large number of parameters that need to be learned. The neural network may learn noise patterns in the training data rather than true underlying relationships. This is why we recommend using the simpler linear SONIA model for datasets with fewer than 100,000 clones, as it has fewer parameters and is less prone to overfitting. The non-linear SoNNia models are better suited for larger datasets where there is enough data to reliably learn complex patterns.

CDR3 sequence definition

This code is quite flexible, however it does demand a very consistent definition of CDR3 (junction) sequences.

CHECK THE DEFINITION OF THE CDR3 REGION OF THE SEQUENCES YOU INPUT. This will likely be the most often problem that occurs.

The default models/genomic data are set up to define the CDR3 region (i.e. the junction) from the conserved cysteine C (INCLUSIVE) in the V region to the conserved F or W (INCLUSIVE) in the J.

Command line console scripts

There are three command line console scripts (the scripts can still be called as executables if SoNNia is not installed):

  • sonnia evaluate
  • evaluates Ppost, Pgen or selection factors of sequences according to a generative V(D)J model and selection model.
  • sonnia generate
  • generates CDR3 sequences, before (like olga) or after selection
  • sonnia infer
  • infers a selection model with respect to a generative V(D)J model

For any of them you can execute with the -h or --help flags to get the options.

Demonstration of the console scripts

We offer a quick demonstration of the console scripts. This will show how to generate and evaluate sequences and infer a selection model using the default generation model for human TCR beta chains that ships with the SONIA software. In order to run the commands below you need to download the examples folder.

  • $ sonnia infer --model humanTRB -i examples/data_seqs.csv.gz
  • This reads in the full file example_seqs.csv.gz, infers a selection model and saves to the folder sel_model
  • $ sonnia generate --model examples/sonnia_model --post -n 100
  • Generate 100 human TRB CDR3 (junction) sequences from the post-selection repertoire and print to stdout along with the V and J genes used to generate them.
  • $ sonnia evaluate --model examples/sonnia_model -i examples/data_seqs.csv.gz --ppost
  • This computes Ppost,Pgen and Q of the first 100 seqs in the data_seqs file.

Single Chain Model Types Table

Model TypeDescriptionChain Type
humanTRAHuman T cell alphaVJ
humanTRBHuman T cell betaVDJ
humanIGHHuman B cell heavyVDJ
humanIGKHuman B cell kappaVJ
humanIGLHuman B cell lambdaVJ
mouseTRBMouse T cell betaVDJ
mouseTRAMouse T cell alphaVJ
mouseIGHMouse B cell heavyVDJ
mouseIGKMouse B cell kappaVJ
mouseIGLMouse B cell lambdaVJ

Python API

In order to incorporate the core algorithm into an analysis pipeline (or to write your own script wrappers) all that is needed is to import the modules. Each module defines some classes that only a few methods get called on.

The modules are:

Module nameClasses
sonia_paired.pySoniaPaired
sonnia_paired.pySoNNiaPaired
sonnia.pySoNNia
sonia.pySonia
utils.pyN/A (contains util functions)
classifiers.pyLinear, SoniaRatio

The classes SoniaPaired, SoNNiaPaired, and SoNNia have similar behaviour to the ones defined in the SONIA package. As an example, the basic import and initialization of the single-chain SoniaLeftposRightpos model

# linear sonia model
from sonia.sonia_leftpos_rightpos import SoniaLeftposRightpos
qm=SoniaLeftposRightpos()

# deep sonia model
from sonnia.sonnia import SoNNia
qm=SoNNia()

# linear paired-chain model (i.e. alpha-beta for TCRs)
from sonnia.sonia_paired import SoniaPaired
qm=SoniaPaired()

# deep paired-chain model (i.e. alpha-beta for TCRs)
from sonnia.sonnia_paired import SoNNiaPaired
qm=SoNNiaPaired()

# linear single-chain model (sonia equivalent)
from sonnia.sonia import Sonia
qm=Sonia()

In the examples folder there is a python notebook (or alternatively the example_pipeline script) which shows the main properties of the software, including:

  • Loading and preprocessing sequence data
  • Training selection models on single-chain data
  • Evaluating selection probabilities
  • Generating synthetic sequences
  • Visualizing selection factors
  • Comparing different models

The fig2_paper folder contains all scripts and explanations needed to reproduce figure 2 of the soNNia paper, which demonstrates:

  • Training deep and linear models on TCR data
  • Comparing selection patterns between models
  • Analyzing position-specific amino acid selection
  • Evaluating model performance and generalization

The fig4_paper folder contains all scripts and explanations needed to reproduce figure 4 of the soNNia paper, which demonstrates:

  • infer model on different datasets
  • compare selection patterns between models
  • plot distance matrix between datasets

Contact

Any issues or questions should be addressed to us.

License

Free use of soNNia is granted under the terms of the GNU General Public License version 3 (GPLv3).

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts