Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

sonnia

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

sonnia

SoNNia is a Python 3 software developed to infer selection pressures on features of amino acid CDR3 sequences. SoNNia takes as input TCR CDR3 amino acid sequences with V and J genes. Its output is sequence-level selection factors which indicate how more or less represented this sequence would be in the selected pool as compared to the pre-selected pool. These in turn could be used to calculate the probability of observing any sequence after selection and sample from the selected repertoire.

  • 0.3.1
  • PyPI
  • Socket score

Maintainers
1

SoNNia is a python software which extends the functionality of the SONIA package. It expands the choice of selection models that can be inferred. Non linear single-chain models and (non-)linear paired-chain models are included in the package. The pre-processing pipeline implemented in the corresponding paper is also included as a separate class. Finally the likelihood ratio classifier and a linear logistic classifier for functional annotation are also included and can be directly applied to T- and B-cell receptor repertoire datasets. image

Documentation

Extensive documentation can be found here.

Version

Latest released version: 0.3.0

Installation

SoNNia is a python software. It is available on PyPI and can be downloaded and installed through pip:

pip install sonnia

SoNNia is also available on GitHub. The command line entry points can be installed by using the setup.py script:

pip install .

Sometimes pip fails to install the dependencies correctly. Thus, if you get any error try first to install the dependencies separately:

pip install tensorflow
pip install matplotlib
pip install olga

References

Isacchini G, Walczak AM, Mora T, Nourmohammad A, Deep generative selection models of T and B cell receptor repertoires with soNNia, (2021) PNAS, https://www.pnas.org/content/118/14/e2023141118.short

SoNNia modules in a Python script

In order to incorporate the core algorithm into an analysis pipeline (or to write your own script wrappers) all that is needed is to import the modules. Each module defines some classes that only a few methods get called on.

The modules are:

Module nameClasses
sonia_paired.pySoniaPaired
sonnia_paired.pySoNNiaPaired
sonnia.pySoNNia
sonia.pySonia
utils.pyN/A (contains util functions)
processing.pyProcessing
classifiers.pyLinear, SoniaRatio

The classes SoniaPaired, SoNNiaPaired, and SoNNia have similar behaviour to the ones defined in the SONIA package. As an example, the basic import and initialization of the single-chain SoniaLeftposRightpos model

from sonia.sonia_leftpos_rightpos import SoniaLeftposRightpos
qm=SoniaLeftposRightpos()

translates into the deep version as

from sonnia.sonnia import SoNNia
qm=SoNNia()

translates into the linear paired-chain (i.e. alpha-beta for TCRs) version as

from sonnia.sonia_paired import SoniaPaired
qm=SoniaPaired()

translates into the deep paired (i.e. alpha-beta for TCRs) version as

from sonnia.sonnia_paired import SoNNiaPaired
qm=SoNNiaPaired()

SoNNia keeps all the functionalities of SONIA. As an example you can infer a linear SONIA model with SoNNia using the following definition of the model:

from sonnia.sonia import Sonia
qm=Sonia()

In the examples folder there is a python notebook (or alternatively the example_pipeline script) which shows the main properties of the software. The fig2_paper folder contains all scripts and explanations needed to reproduce figure 2 of the soNNia paper (TODO: this needs to be updated to new model behaviour)

Command line console scripts

There are three command line console scripts (the scripts can still be called as executables if SoNNia is not installed):

  1. sonnia-evaluate
  • evaluates Ppost, Pgen or selection factors of sequences according to a generative V(D)J model and selection model.
  1. sonnia-generate
  • generates CDR3 sequences, before (like olga) or after selection
  1. sonnia-infer
  • infers a selection model with respect to a generative V(D)J model

For any of them you can execute with the -h or --help flags to get the options.

We offer a quick demonstration of the console scripts. This will show how to generate and evaluate sequences and infer a selection model using the default generation model for human TCR beta chains that ships with the SONIA software. In order to run the commands below you need to download the examples folder.

  1. $ sonnia-infer --humanTRB -i examples/data_seqs.txt -d ';' -m 10000
  • This reads in the full file example_seqs.txt, infers a selection model and saves to the folder sel_model
  1. $ sonnia-generate --set_custom_model_VDJ examples/sonnia_model --post -n 100
  • Generate 100 human TRB CDR3 sequences from the post-selection repertoire and print to stdout along with the V and J genes used to generate them.
  1. $ sonnia-evaluate --set_custom_model_VDJ examples/sonnia_model -i examples/data_seqs.txt --ppost -m 100 -d ';'
  • This computes Ppost,Pgen and Q of the first 100 seqs in the data_seqs file.

Notes about CDR3 sequence definition and Dataset size

This code is quite flexible, however it does demand a very consistent definition of CDR3 sequences.

CHECK THE DEFINITION OF THE CDR3 REGION OF THE SEQUENCES YOU INPUT. This will likely be the most often problem that occurs.

The default models/genomic data are set up to define the CDR3 region from the conserved cysteine C (INCLUSIVE) in the V region to the conserved F or W (INCLUSIVE) in the J. This corresponds to positions X and X according to IMGT.

Neural Network models suffer from overfitting issues in the low data regime. While the use of appropriate regularization could reduce the risk of overfitting, it is recommended to use the linear SONIA model for datasets with fewer than 100 000 receptor sequences.

Contact

Any issues or questions should be addressed to us.

License

Free use of soNNia is granted under the terms of the GNU General Public License version 3 (GPLv3).

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc