Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Detect CRISPR-Cas genes and arrays, and predict the subtype based on both Cas genes and CRISPR repeat sequence.
CRISPRCasTyper and RepeatType are also available through a webserver
This software finds Cas genes with a large suite of HMMs, then groups these HMMs into operons, and predicts the subtype of the operons based on a scoring scheme. Furthermore, it finds CRISPR arrays with minced and by BLASTing a large suite of known repeats, and using a kmer-based machine learning approach (extreme gradient boosting trees) it predicts the subtype of the CRISPR arrays based on the consensus repeat. It then connects the Cas operons and CRISPR arrays, producing as output:
I-A, I-B, I-C, I-D, I-E, I-F, I-F (transposon), I-G, II-A, II-B, II-C, III-A, III-B, III-C, III-D, III-E, III-F, IV-A1, IV-A2, IV-A3, IV-B, IV-C, IV-D, IV-E, V-A, V-B1, V-B2, V-C, V-D, V-E, V-F1, V-F2, V-F3, V-F (the rest), V-G, V-H, V-I, V-J, V-K, V-L, VI-A, VI-B1, VI-B2, VI-C, VI-D, VI-X, VI-Y.
All subtypes from the most recent Nature Reviews Microbiology (Makarova et al. 2020): Evolutionary classification of CRISPR–Cas systems: a burst of class 2 and derived variants
Updated type IV subtypes and variants based on: Type IV CRISPR–Cas systems are highly diverse and involved in competition between plasmids
Type V-K: RNA-guided DNA insertion with CRISPR-associated transposases
Transposon associated type I-F: Transposon-encoded CRISPR–Cas systems direct RNA-guided DNA integration
New V-A variants: Novel Type V-A CRISPR Effectors Are Active Nucleases with Expanded Targeting Capabilities
New Cas13s: Programmable RNA editing with compact CRISPR–Cas13 systems from uncultivated microbes
V-L (cas12l): A new family of CRISPR-type V nucleases with C-rich PAM recognition
Find a free to read version on BioRxiv
conda create -n cctyper -c conda-forge -c bioconda -c russel88 cctyper
conda activate cctyper
cctyper my.fasta my_output
CRISPRCasTyper can be installed either through conda or pip.
It is advised to use conda, since this installs CRISPRCasTyper and all dependencies, and downloads the database in one go.
Use miniconda or anaconda to install.
Create the environment with CRISPRCasTyper and all dependencies and database
conda create -n cctyper -c conda-forge -c bioconda -c russel88 cctyper
If you have the dependencies (Python >= 3.8, HMMER >= 3.2, Prodigal >= 2.6, minced, grep, sed) in your PATH you can install with pip
Install cctyper python module
python -m pip install cctyper
Upgrade cctyper python module to the latest version
python -m pip install cctyper --upgrade
# Download and unpack
svn checkout https://github.com/Russel88/CRISPRCasTyper/trunk/data
tar -xvzf data/Profiles.tar.gz
mv Profiles/ data/
rm data/Profiles.tar.gz
# Tell CRISPRCasTyper where the data is:
# either by setting an environment variable (has to be done for each terminal session, or added to .bashrc):
export CCTYPER_DB="/path/to/data/"
# or by using the --db argument each time you run CRISPRCasTyper:
cctyper input.fa output --db /path/to/data/
CRISPRCasTyper takes as input a nucleotide fasta, and produces outputs with CRISPR-Cas predictions
conda activate cctyper
cctyper genome.fa my_output
cctyper genome.fa my_output --circular
The default prodigal mode expects the input to be a single draft or complete genome
cctyper assembly.fa my_output --prodigal meta
cctyper -h
--keep_tmp
the following is also producedFiles are only created if there is any data. For example, the CRISPR_Cas.tab file is only created if there are any CRISPR-Cas loci.
CRISPRCasTyper will automatically plot a map of the CRISPR-Cas loci, orphan Cas operons, and orphan CRISPR arrays.
These maps can be expanded (--expand N
) by adding unknown genes and genes with alignment scores below the thresholds. This can help in identify potentially un-annotated genes in operons. You can generate new plots without having to re-run the entire pipeline by adding --redo_typing
to the command. This will re-use the mappings and re-type the operons and re-make the plot, based on new thresholds and plot parameters.
The plot below is run with --expand 5000
With an input of CRISPR repeats (one per line, in a simple textfile) RepeatTyper will predict the subtype, based on the kmer composition of the repeat
conda activate cctyper
repeatType repeats.txt
The script prints:
The CCTyper webserver is crowdsourcing subtyped repeats and includes an updated RepeatTyper model based on a much larger set of repeats and contains additional subtypes compared to the curated RepeatTyper model. This updated model is automatically retrained each month and the models can be downloaded here.
From version 1.4.0 and onwards of CCTyper the newest repeatTyper model is included upon release of the version.
Each model contains a training report (xgb_report), where you can find the training log, and in the bottom the accuracy, both overall and per subtype.
Save the original database files:
mv ${CCTYPER_DB}/type_dict.tab ${CCTYPER_DB}/type_dict_orig.tab
mv ${CCTYPER_DB}/xgb_repeats.model ${CCTYPER_DB}/xgb_repeats_orig.model
Move the new model into the database folder
mv repeat_model/* ${CCTYPER_DB}/
You can train the repeat classifier with your own set of subtyped repeats. With a tab-delimeted input where 1. column contains the subtypes and 2. column contains the CRISPR repeat sequences, RepeatTrain will train a CRISPR repeat classifier that is directly usable for both RepeatTyper and CRISPRCasTyper.
repeatTrain typed_repeats.tab my_classifier
repeatType repeats.txt --db my_classifier
Save the original database files:
mv ${CCTYPER_DB}/type_dict.tab ${CCTYPER_DB}/type_dict_orig.tab
mv ${CCTYPER_DB}/xgb_repeats.model ${CCTYPER_DB}/xgb_repeats_orig.model
Move the new model into the database folder
mv my_classifier/* ${CCTYPER_DB}/
Large metagenomic assemblies with many small contigs can exhaust the RAM on your laptop. Fortunately, as metagenomic contigs are analysed separately (when run with --prodigal meta
) a simple solution is to split the input into smaller chunks (e.g. with pyfasta)
FAQs
CRISPRCasTyper: Automatic detection and subtyping of CRISPR-Cas operons
We found that cctyper demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.