pydustmasker
pydustmasker
is a Python library that provides an efficient implementation of the SDUST algorithm1, designed to identify and mask low-complexity regions in nucleotide sequences.
Usage
pydustmasker
provides a DustMasker
class that enables identification of low-complexity regions in an input DNA sequence and mask these regions.
Here is a basic example of how to use pydustmasker
:
import pydustmasker
masker = pydustmasker.DustMasker("CGTATATATATAGTATGCGTACTGGGGGGGCT")
>>> print(masker.intervals)
[(23, 30)]
>>> print(masker.n_masked_bases)
7
>>> print(masker.mask())
CGTATATATATAGTATGCGTACTgggggggCT
>>> print(masker.mask(hard=True))
CGTATATATATAGTATGCGTACTNNNNNNNCT
>>> masker = pydustmasker.DustMasker(
... "CGTATATATATAGTATGCGTACTGGGGGGGCT",
... score_threshold=10
... )
>>> print(masker.intervals)
[(2, 12), (23, 30)]
>>> print(masker.mask())
CGtatatatataGTATGCGTACTgggggggCT