Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

varcode

Package Overview
Dependencies
Maintainers
5
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

varcode

Variant annotation in Python

  • 1.2.1
  • PyPI
  • Socket score

Maintainers
5

Tests Coverage Status PyPI PyPI downloads

Varcode

Varcode is a library for working with genomic variant data in Python and predicting the impact of those variants on protein sequences.

Installation

You can install varcode using pip:

pip install varcode

You can install required reference genome data through PyEnsembl as follows:

# Downloads and installs the Ensembl releases (75 and 76)
pyensembl install --release 75 76

Example

import varcode

# Load TCGA MAF containing variants from their
variants = varcode.load_maf("tcga-ovarian-cancer-variants.maf")

print(variants)
### <VariantCollection from 'tcga-ovarian-cancer-variants.maf' with 6428 elements>
###  -- Variant(contig=1, start=69538, ref=G, alt=A, genome=GRCh37)
###  -- Variant(contig=1, start=881892, ref=T, alt=G, genome=GRCh37)
###  -- Variant(contig=1, start=3389714, ref=G, alt=A, genome=GRCh37)
###  -- Variant(contig=1, start=3624325, ref=G, alt=T, genome=GRCh37)
###  ...

# you can index into a VariantCollection and get back a Variant object
variant = variants[0]

# groupby_gene_name returns a dictionary whose keys are gene names
# and whose values are themselves VariantCollections
gene_groups = variants.groupby_gene_name()

# get variants which affect the TP53 gene
TP53_variants = gene_groups["TP53"]

# predict protein coding effect of every TP53 variant on
# each transcript of the TP53 gene
TP53_effects = TP53_variants.effects()

print(TP53_effects)
### <EffectCollection with 789 elements>
### -- PrematureStop(variant=chr17 g.7574003G>A, transcript_name=TP53-001, transcript_id=ENST00000269305, effect_description=p.R342*)
### -- ThreePrimeUTR(variant=chr17 g.7574003G>A, transcript_name=TP53-005, transcript_id=ENST00000420246)
### -- PrematureStop(variant=chr17 g.7574003G>A, transcript_name=TP53-002, transcript_id=ENST00000445888, effect_description=p.R342*)
### -- FrameShift(variant=chr17 g.7574030_7574030delG, transcript_name=TP53-001, transcript_id=ENST00000269305, effect_description=p.R333fs)
### ...

premature_stop_effect = TP53_effects[0]

print(str(premature_stop_effect.mutant_protein_sequence))
### 'MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMF'

print(premature_stop_effect.aa_mutation_start_offset)
### 341

print(premature_stop_effect.transcript)
### Transcript(id=ENST00000269305, name=TP53-001, gene_name=TP53, biotype=protein_coding, location=17:7571720-7590856)

print(premature_stop_effect.gene.name)
### 'TP53'

If you are looking for a quick start guide, you can check out this iPython book that demonstrates simple use cases of Varcode

Effect Types

Effect typeDescription
AlternateStartCodonReplace annotated start codon with alternative start codon (e.g. "ATG>CAG").
ComplexSubstitutionInsertion and deletion of multiple amino acids.
DeletionCoding mutation which causes deletion of amino acid(s).
ExonLossDeletion of entire exon, significantly disrupts protein.
ExonicSpliceSiteMutation at the beginning or end of an exon, may affect splicing.
FivePrimeUTRVariant affects 5' untranslated region before start codon.
FrameShiftTruncationA frameshift which leads immediately to a stop codon (no novel amino acids created).
FrameShiftOut-of-frame insertion or deletion of nucleotides, causes novel protein sequence and often premature stop codon.
IncompleteTranscriptCan't determine effect since transcript annotation is incomplete (often missing either the start or stop codon).
InsertionCoding mutation which causes insertion of amino acid(s).
IntergenicOccurs outside of any annotated gene.
IntragenicWithin the annotated boundaries of a gene but not in a region that's transcribed into pre-mRNA.
IntronicSpliceSiteMutation near the beginning or end of an intron but less likely to affect splicing than donor/acceptor mutations.
IntronicVariant occurs between exons and is unlikely to affect splicing.
NoncodingTranscriptTranscript doesn't code for a protein.
PrematureStopInsertion of stop codon, truncates protein.
SilentMutation in coding sequence which does not change the amino acid sequence of the translated protein.
SpliceAcceptorMutation in the last two nucleotides of an intron, likely to affect splicing.
SpliceDonorMutation in the first two nucleotides of an intron, likely to affect splicing.
StartLossMutation causes loss of start codon, likely result is that an alternate start codon will be used down-stream (possibly in a different frame).
StopLossLoss of stop codon, causes extension of protein by translation of nucleotides from 3' UTR.
SubstitutionCoding mutation which causes simple substitution of one amino acid for another.
ThreePrimeUTRVariant affects 3' untranslated region after stop codon of mRNA.

Coordinate System

Varcode currently uses a "base counted, one start" genomic coordinate system, to match the Ensembl annotation database. We are planning to switch over to "space counted, zero start" (interbase) coordinates, since that system allows for more uniform logic (no special cases for insertions). To learn more about genomic coordinate systems, read this blog post.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc