
Research
SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.
genmod
Advanced tools
GENMOD is a simple to use command line tool for annotating and analyzing genomic variations in the VCF file format. GENMOD can annotate genetic patterns of inheritance in vcf:s with single or multiple families of arbitrary size.
The tools in the genmod suite are:
GENMOD
pip install genmod
or
git clone https://github.com/Clinical-Genomics/genmod.git
cd genmod
uv install genmod
uv run genmod
This is an overview, for more in depth documentation see documentation
The following command should work when installed successfully. The files are distributed with the package.
$ cat examples/test_vcf.vcf
##fileformat=VCFv4.1
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##contig=<ID=1,length=249250621,assembly=b37>
##reference=file:///humgen/gsa-hpprojects/GATK/bundle/current/b37/human_g1k_v37.fasta
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT father mother proband father_2 mother_2 proband_2
1 879537 . T C 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 0/1:10,10:60 1/1:10,10:60 0/0:10,10:60 0/1:10,10:60 1/1:10,10:60
1 879541 . G A 100 PASS MQ=1 GT:AD:GQ ./. 0/1:10,10:60 1/1:10,10:60 ./. 0/1:10,10:60 0/1:10,10:60
1 879595 . C T 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 0/0:10,10:60 1/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60
1 879676 . G A 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 1/1:10,10:60 1/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60
1 879911 . G A 100 PASS MQ=1 GT:AD:GQ 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60
1 880012 . A G 100 PASS MQ=1 GT:AD:GQ 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60
1 880086 . T C 100 PASS MQ=1 GT:AD:GQ 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60
1 880199 . G A 100 PASS MQ=1 GT:AD:GQ 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60
1 880217 . T G 100 PASS MQ=1 GT:AD:GQ 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60
10 76154051 . A G 100 PASS MQ=1 GT:AD:GQ 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60
10 76154073 . T G 100 PASS MQ=1 GT:AD:GQ 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60
10 76154074 . C G 100 PASS MQ=1 GT:AD:GQ ./. 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60
10 76154076 . G C 100 PASS MQ=1 GT:AD:GQ ./. 0/0:10,10:60 0/1:10,10:60 ./. 0/0:10,10:60 0/1:10,10:60
X 302253 . CCCTCCTGCCCCT C 100 PASS MQ=1 GT:AD:GQ 0/0:10,10:60 0/1:10,10:60 1/1:10,10:60 0/0:10,10:60 1/1:10,10:60 1/1:10,10:60
MT 302253 . CCCTCCTGCCCCT C 100 PASS MQ=1 GT:AD:GQ 0/0:10,10:60 0/1:10,10:60 1/1:10,10:60 0/0:10,10:60 1/1:10,10:60 1/1:10,10:60
$ cat examples/test_vcf.vcf |\
>genmod annotate - --annotate-regions |\
>genmod models - --family_file examples/recessive_trio.ped > test_vcf_models_annotated.vcf
$ cat test_vcf_models_annotated.vcf
##fileformat=VCFv4.1
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=Annotation,Number=.,Type=String,Description="Annotates what feature(s) this variant belongs to.">
##INFO=<ID=Exonic,Number=0,Type=Flag,Description="Indicates if the variant is exonic.">
##INFO=<ID=GeneticModels,Number=.,Type=String,Description="':'-separated list of genetic models for this variant.">
##INFO=<ID=ModelScore,Number=.,Type=String,Description="PHRED score for genotype models.">
##INFO=<ID=Compounds,Number=.,Type=String,Description="List of compound pairs for this variant.The list is splitted on ',' family id is separated with compoundswith ':'. Compounds are separated with '|'.">
##contig=<ID=1,length=249250621,assembly=b37>
##reference=file:///humgen/gsa-hpprojects/GATK/bundle/current/b37/human_g1k_v37.fasta
##Software=<ID=genmod,Version=3.0.1,Date="2015-09-22 08:40",CommandLineOptions="processes=4 keyword=Annotation family_type=ped family_file=<open file 'examples/recessive_trio.ped', mode 'r' at 0x102d3a780> variant_file=<_io.TextIOWrapper name='<stdin>' encoding='utf-8'> logger=<logging.Logger object at 0x102d64250>">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT father mother proband father_2 mother_2 proband_2
1 879537 . T C 100 PASS MQ=1;Exonic;Annotation=SAMD11;GeneticModels=1:AR_hom;ModelScore=1:55.0 GT:AD:GQ 0/1:10,10:60 0/1:10,10:60 1/1:10,10:60 0/0:10,10:60 0/1:10,10:60 1/1:10,10:60
1 879541 . G A 100 PASS MQ=1;Exonic;Annotation=SAMD11;GeneticModels=1:AR_hom_dn|AR_hom;ModelScore=1:57.0 GT:AD:GQ ./. 0/1:10,10:60 1/1:10,10:60 ./. 0/1:10,10:60 0/1:10,10:60
1 879595 . C T 100 PASS MQ=1;Exonic;Annotation=NOC2L,SAMD11;GeneticModels=1:AR_hom_dn;ModelScore=1:55.0 GT:AD:GQ 0/1:10,10:60 0/0:10,10:60 1/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60
1 879676 . G A 100 PASS MQ=1;Exonic;Annotation=NOC2L,SAMD11 GT:AD:GQ 0/1:10,10:60 1/1:10,10:60 1/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60
1 879911 . G A 100 PASS MQ=1;Exonic;Annotation=NOC2L,SAMD11;Compounds=1:1_880086_T_C|1_880012_A_G;GeneticModels=1:AR_comp|AR_comp_dn;ModelScore=1:55.0 GT:AD:GQ 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60
1 880012 . A G 100 PASS MQ=1;Exonic;Annotation=NOC2L;Compounds=1:1_879911_G_A|1_880086_T_C;GeneticModels=1:AR_comp|AR_comp_dn;ModelScore=1:55.0 GT:AD:GQ 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60
1 880086 . T C 100 PASS MQ=1;Exonic;Annotation=NOC2L;Compounds=1:1_879911_G_A|1_880012_A_G;GeneticModels=1:AD_dn|AR_comp_dn;ModelScore=1:55.0 GT:AD:GQ 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60
1 880199 . G A 100 PASS MQ=1;Annotation=NOC2L;GeneticModels=1:AD_dn;ModelScore=1:55.0 GT:AD:GQ 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60
1 880217 . T G 100 PASS MQ=1;Annotation=NOC2L;GeneticModels=1:AD_dn;ModelScore=1:55.0 GT:AD:GQ 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60
10 76154051 . A G 100 PASS MQ=1;Exonic;Annotation=ADK;Compounds=1:10_76154073_T_G;GeneticModels=1:AR_comp_dn;ModelScore=1:55.0 GT:AD:GQ 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60 0/0:10,10:60 0/1:10,10:60 0/1:10,10:60
10 76154073 . T G 100 PASS MQ=1;Exonic;Annotation=ADK;Compounds=1:10_76154051_A_G;GeneticModels=1:AD_dn|AR_comp_dn;ModelScore=1:55.0 GT:AD:GQ 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60 0/0:10,10:60 0/0:10,10:60 0/1:10,10:60
10 76154074 . C G 100 PASS MQ=1;Annotation=ADK GT:AD:GQ ./. 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60 0/1:10,10:60
10 76154076 . G C 100 PASS MQ=1;Annotation=ADK;GeneticModels=1:AD_dn|AD;ModelScore=1:57.0 GT:AD:GQ ./. 0/0:10,10:60 0/1:10,10:60 ./. 0/0:10,10:60 0/1:10,10:60
X 302253 . CCCTCCTGCCCCT C 100 PASS MQ=1;Annotation=PPP2R3B;GeneticModels=1:XD|XR;ModelScore=1:55.0 GT:AD:GQ 0/0:10,10:60 0/1:10,10:60 1/1:10,10:60 0/0:10,10:60 1/1:10,10:60 1/1:10,10:60
MT 302253 . CCCTCCTGCCCCT C 100 PASS MQ=1;GeneticModels=1:AR_hom_dn;ModelScore=1:55.0 GT:AD:GQ 0/0:10,10:60 0/1:10,10:60 1/1:10,10:60 0/0:10,10:60 1/1:10,10:60 1/1:10,10:60
The basic idea with genmod is to make fast and easy analysis of vcf variants for rare disease. It can still be interesting to use in other cases, such as annotating what genetic regions the variants in a bacteria belongs to. genmod can annotate accurate patterns of inheritance in arbitrary sized families. The genetic models checked are the basic mendelian ones, these are:
genmod is made for working on any type of annotated vcf. To get relevant Autosomal Compound Heterozygotes we need to know what genetic regions that the variants belong to. We can use annotations from the Variant Effect Predictor or let genmod do the annotation.
genmod comes annotation set that is made from ensemble.
It is possible to use the 37 or 38 build, see genmod annotate --help
Any annotation in the bed format can be used.
(There are files for testing the following commands in genmod/examples)
To annotate the variants with user defined regions use
$genmod annotate <vcf_file> -r/--annotate-regions --region-file path_to_regions.bed
Now the variants are ready to get their models annotated:
$genmod models <vcf_file> -f/--family_file <family.ped>
genmod annotate variant_file.vcf
This will print a new vcf to standard out with all variants annotated according to the statements below. All individuals described in the ped file must be present in the vcf file
See examples in the folder genmod/examples.
From version 1.9 genmod can split multiallelic calls in VCFs: use flag -split/--split_variants.
To get an example of how splitting variants work, run genmod on the file examples/multi_allele_example.vcf with the dominant trio.
That is:
genmod annotate examples/multi_allele_example.vcf -f examples/dominant_trio.ped -split
Compare the result when not using the -split flag.
Each variant in the VCF-file will be annotated with which genetic models that are followed in the family if a family file (ped file) is provided.
The genetic models that are checked are the following:
Se description of how genetic models are annotated in the section Conditions for genetic models below.
It is possible to run without a family file, in this case all variants will be annotated with which region(s) they belong to, and if other annotation files are provided(1000G, CADD scores etc.) the variants will get the proper values from these.
Variant Effect Predictor(vep) annotations are supported, use the --vep-flag if variants are already annotated with vep.
GENMOD will add entries to the INFO column for the given VCF file depending on what information is given.
If --vep is NOT provided:
If --vep is used Annotation will not be annotated since all information is in the vep entry.
If a pedigree file is provided the following will be added:
GeneticModels=fam_id_1:AR_hom, fam_id_2:AR_comp|AD_dn etc..Also a line for logging is added in the vcf header with the id genmod, here the date of run, version and command line arguments are printed.
Compound heterozygote inheritance pattern will be checked if two variants are exonic (or in canonical splice sites) and if they reside in the same gene.
GENMOD supports phased data, use the -phased flag. Data should follow the GATK way of phasing.
All annotations will be present only if they have a value.
-kg/--thousand_g path/to/bgzipped/thousand_genomes.vcf.gz--exac path/to/bgzipped/ExAC_file.vcf.gz-cadd/--cadd_file path/to/huge_cadd_file.tsv.gz.-c1kg/--cadd_1000_g path/to/CADD_1000g.txt.gz.--cadd_esp path/to/CADD_ESP.tsv.gz.--cadd_exac path/to/CADD_ExAC.tsv.gz.--cadd_indels path/to/CADD_InDels.txt.gz.--cadd_raw flag. In this case a info field 'CADD_raw=score'.-vep/--vep-phased/--phased-splice/--splice_padding <integer>-strict/--strict flag tells genmod to only annotate genetic models if they are proved by the data. If a variant is not called in a family member it will not be annotated.Sort a VCF file based on Rank Score.
Usage: genmod sort [OPTIONS] <vcf_file> or -
Sort a VCF file based on rank score.
Options:
-o, --outfile PATH Specify the path to a file where results should be
stored.
-f, --family_id TEXT Specify the family id for sorting. If no family id the
first family found in annotation will be used.
-v, --verbose Increase output verbosity.
--help Show this message and exit.
Since we only look at humans, that are diploid, the genotypes represent what we see on both alleles in a single position. 0 represents the reference sequence, 1 is the first of the alternative alleles, 2 second alternative and so on. If no phasing has been done the genotype is an unordered pair on the form x/x, so 0/1 means that the individual is heterozygote in this given position with the reference base on one of the alleles and the first of the alternatives on the other. 2/2 means that we see the second of the alternatives on both alleles. Some chromosomes are only present in one copy in humans, here it is allowed to only use a single digit to show the genotype. A 0 would mean reference and 1 first of alternatives.
If phasing has been done the pairs are not unordered anymore and the delimiter is then changed to '|', so one can be heterozygote in two ways; 0|1 or 1|0.
For this model individuals can be carriers so healthy individuals can be heterozygous. Both alleles need to have the variant for an individual to be sick so a healthy individual can not be homozygous alternative and a sick individual has to be homozygous alternative.
This model includes pairs of exonic variants that are present within the same gene.
Non-phased data:
Phased data:
These traits are inherited on the x-chromosome, of which men have one allele and women have two.
FAQs
Annotate genetic inheritance models in Variant Call Format (VCF) files
We found that genmod demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 5 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.

Company News
Socket is proud to join the OpenJS Foundation as a Silver Member, deepening our commitment to the long-term health and security of the JavaScript ecosystem.

Security News
npm now links to Socket's security analysis on every package page. Here's what you'll find when you click through.