Palindrome tree
Palindrome tree tool is used for analyzing inverted repeats in various DNA sequences using decision trees. This tool takes provided sequences and finds interesting parts in which there's high probability of palindrome occurrence using decision tree. This process filters a big portion of data. Interesting data are then analyzed using API from Palindrome Analyzer. DNA Analyser is a web-based server for nucleotide sequence analysis. It has been developed thanks to cooperation of Department of Informatics, Mendel’s University in Brno and Institute of Biophysics, Academy of Sciences of the Czech Republic.
Requirements
Palindrome tree was built with Python 3.7+.
Installation
To install palindrome tree use Pypi repository.
pip install palindrome-tree
Usage
User has to initialize palindrome tree analyzer instance which is imported from main package palindrome_tree
.
from palindrome_tree import PalindromeTree
tree = PalindromeTree()
Predict regions (without API validation)
To predict regions with possible palindromes, run analyse without setting check_with_api
paramether.
from palindrome_tree import PalindromeTree
sequence_file = open("/path/to/sequence/name.txt", "r")
tree = PalindromeTree()
tree.analyse(
sequence=sequence_file.read(),
)
tree.results
The results are then stored in results variable as pd.DataFrame
.
| position | sequence |
---|
0 | 8 | TTTGTAGAGACAGGGTCTTGCTGTGTTTCC |
1 | 10 | TGTAGAGACAGGGTCTTGCTGTGTTTCCCA |
2 | 49 | CGAACTCCTGGCCTCTAGGCAATCCTCCCA |
3 | 102 | ATCCCACTCTTTTTTGAAAAATAAAATCTA |
4 | 105 | CCACTCTTTTTTGAAAAATAAAATCTACCA |
Predict regions (with API validation)
To predict regions with possible palindromes and afterward validation, run analyse with check_with_api
paramether set.
from palindrome_tree import PalindromeTree
sequence_file = open("/path/to/sequence/name.txt", "r")
tree = PalindromeTree()
tree.analyse(
sequence=sequence_file.read(),
validate_with_api=True,
)
tree.validated_results
The results are also stored in results variable as pd.DataFrame
.
| original_index | after | before | mismatches | opposite | position | sequence | signature | spacer | stability_NNModel |
---|
0 | 0 | CC | TTTGT | 2 | CTGTGTTT | 5 | AGAGACAG | 8-7-2 | GGTCTTG | {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85} |
1 | 0 | TGCTG | TTTGT | 2 | GGGTCT | 5 | AGAGAC | 6-1-2 | A | {'cruciform': -2.54, 'linear': -13.84, 'delta': 11.3} |
2 | 0 | GTGTT | TGTAG | 2 | CTTGCT | 7 | AGACAG | 6-3-2 | GGT | {'cruciform': -1.94, 'linear': -17.509999999999998, 'delta': 15.569999999999999} |
3 | 0 | TTCC | TAGAG | 2 | CTGTGT | 9 | ACAGGG | 6-5-2 | TCTTG | {'cruciform': -3.7399999999999998, 'linear': -20.99, 'delta': 17.25} |
4 | 1 | CCCA | TGT | 2 | CTGTGTTT | 3 | AGAGACAG | 8-7-2 | GGTCTTG | {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85} |
Dependencies
- xgboost = "^1.5.1"
- pandas = "^1.3.5"
- scikit-learn = "^1.0.2"
- requests = "^2.26.0"
Authors
License
This project is licensed under the MIT License - see the
LICENSE
file for details.