pydna
Pydna is a python package that provides a human-readable formal descriptions of 🧬 cloning and genetic assembly strategies in Python 🐍 for simulation and verification.
Pydna can be used as executable documentation for cloning.
Designing genetic constructs with many components and steps, like recombinant metabolic pathways 🧫, often makes accurate documentation difficult, as seen in the poor state of scientific literature ☢️
A cloning strategy expressed in pydna is complete, unambiguous and stable.
Pydna provides simulation of:
- Primer design
- PCR
- Restriction digestion
- Ligation
- Gel electrophoresis of DNA with generation of gel images
- Homologous recombination
- Gibson assembly
- Golden gate assembly (in progress)
Virtually any sub-cloning experiment can be described in pydna, and its execution yield the sequences of intermediate and final DNA molecules.
Pydna has been designed with the goal of being understandable for biologists with only some basic understanding of Python.
Pydna can formalize planning and sharing of cloning strategies and is especially useful for complex or combinatorial
DNA molecule constructions.
Acknowledgement 🤝
If you use pydna in your research, please reference the paper:
Pereira, F., Azevedo, F., Carvalho, Â., Ribeiro, G. F., Budde, M. W., & Johansson, B. (2015). Pydna: a simulation and documentation tool for DNA assembly strategies using python. BMC Bioinformatics, 16(142), 142. doi:10.1186/s12859-015-0544-x
Documentation and usage 📚
Full documentation of all modules and classes can be found at https://bjornfjohansson.github.io/pydna.
To get started, we recommend you to have a look at the example notebooks. Start by having a look at Dseq, Dseq_Features and Importing_Seqs, which cover the basics of working with sequences. The rest of the notebooks cover how to use pydna for different cloning strategies, such as Gibson assembly, Restriction-Ligation, etc.
Most pydna functionality is implemented as methods for the double stranded DNA sequence record
classes Dseq and Dseqrecord, which are subclasses of the Biopython
Seq and SeqRecord classes.
These classes make PCR primer design, PCR simulation and cut-and-paste cloning very simple:
NOTE: You can run this example in this notebook
from pydna.dseqrecord import Dseqrecord
dsr = Dseqrecord("ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAAT")
dsr.add_feature(x=0, y=60,type="gene", label="my_gene")
dsr.figure()
Dseqrecord(-60)
ATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAAT
TACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTA
print(dsr.format("genbank"))
LOCUS name 60 bp DNA linear UNK 01-JAN-1980
DEFINITION description.
ACCESSION id
VERSION id
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
misc 1..60
/type="gene"
/label="my_gene"
ORIGIN
1 atgcaaacag taatgatgga tgacattcaa agcactgatt ctattgctga aaaagataat
//
from pydna.design import primer_design
amplicon = primer_design(dsr, limit=13, target_tm=55)
print("forward primer:", amplicon.forward_primer.seq)
print("reverse primer:", amplicon.reverse_primer.seq)
amplicon.figure()
forward primer: ATGCAAACAGTAATGATGGA
reverse primer: ATTATCTTTTTCAGCAATAGAATCA
5ATGCAAACAGTAATGATGGA...TGATTCTATTGCTGAAAAAGATAAT3
|||||||||||||||||||||||||
3ACTAAGATAACGACTTTTTCTATTA5
5ATGCAAACAGTAATGATGGA3
||||||||||||||||||||
3TACGTTTGTCATTACTACCT...ACTAAGATAACGACTTTTTCTATTA5
from pydna.amplify import pcr
forward_primer = "ccccGGATCC" + amplicon.forward_primer
reverse_primer = "ttttGGATCC" + amplicon.reverse_primer
pcr_product = pcr(forward_primer, reverse_primer, dsr)
pcr_product.figure()
5ATGCAAACAGTAATGATGGA...TGATTCTATTGCTGAAAAAGATAAT3
|||||||||||||||||||||||||
3ACTAAGATAACGACTTTTTCTATTACCTAGGtttt5
5ccccGGATCCATGCAAACAGTAATGATGGA3
||||||||||||||||||||
3TACGTTTGTCATTACTACCT...ACTAAGATAACGACTTTTTCTATTA5
pcr_product = Dseqrecord(pcr_product)
pcr_product.figure()
Dseqrecord(-80)
ccccGGATCCATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATGGATCCaaaa
ggggCCTAGGTACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTACCTAGGtttt
from Bio.Restriction import BamHI
a, payload, c = pcr_product.cut (BamHI)
print(a.figure())
print()
print (payload.figure())
print()
print(c.figure())
Dseqrecord(-9)
ccccG
ggggCCTAG
Dseqrecord(-70)
GATCCATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATG
GTACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTACCTAG
Dseqrecord(-9)
GATCCaaaa
Gtttt
vector = Dseqrecord("aatgtttttccctCCCGGGcaaaatAGATCTtgctatgcatcatcgatct", circular=True, name="vect")
vector.figure()
Dseqrecord(o50)
aatgtttttccctCCCGGGcaaaatAGATCTtgctatgcatcatcgatct
ttacaaaaagggaGGGCCCgttttaTCTAGAacgatacgtagtagctaga
from Bio.Restriction import BglII
linear_vector_bgl = vector.cut(BglII)[0]
rec_vector= (linear_vector_bgl + payload).looped().synced(vector)
rec_vector.figure()
Dseqrecord(o116)
aatgtttttccctCCCGGGcaaaatAGATCCATGCAAACAGTAATGATGGATGACATTCAAAGCACTGATTCTATTGCTGAAAAAGATAATGGATCTtgctatgcatcatcgatct
ttacaaaaagggaGGGCCCgttttaTCTAGGTACGTTTGTCATTACTACCTACTGTAAGTTTCGTGACTAAGATAACGACTTTTTCTATTACCTAGAacgatacgtagtagctaga
from pydna.assembly import Assembly
fragments = [
Dseqrecord('aatgtttttccctCACTACGtgctatgcatcat', name="fragment_A"),
Dseqrecord('tgctatgcatcatCTATGGAcactctaataatg', name="fragment_B"),
Dseqrecord('cactctaataatgTTACATAaatgtttttccct', name="fragment_C"),
]
asm = Assembly(fragments, limit=10)
product, *rest = asm.assemble_circular()
product.figure()
-|fragment_A|13
| \/
| /\
| 13|fragment_B|13
| \/
| /\
| 13|fragment_C|13
| \/
| /\
| 13-
| |
--------------------------------------------
Dseqrecord(product).figure()
Dseqrecord(o60)
aatgtttttccctCACTACGtgctatgcatcatCTATGGAcactctaataatgTTACATA
ttacaaaaagggaGTGATGCacgatacgtagtaGATACCTgtgagattattacAATGTAT
As the example above shows, pydna keeps track of sticky ends and features.
Pydna can be very compact. The eleven lines of Python below simulates the construction of a recombinant plasmid.
DNA sequences are downloaded from Genbank by accession numbers that are guaranteed to be stable over time.
from pydna.genbank import Genbank
gb = Genbank("myself@email.com")
gene = gb.nucleotide("X06997")
from pydna.parsers import parse_primers
primer_f,primer_r = parse_primers(''' >760_KlLAC12_rv (20-mer)
ttaaacagattctgcctctg
>759_KlLAC12_fw (19-mer)
aaatggcagatcattcgag ''')
from pydna.amplify import pcr
pcr_prod = pcr(primer_f,primer_r, gene)
vector = gb.nucleotide("AJ001614")
from Bio.Restriction import EcoRV
lin_vector = vector.linearize(EcoRV)
rec_vec = ( lin_vector + pcr_prod ).looped()
Installation 📦
By default, pydna is installed with minimal dependencies, but there are optional dependencies for additional functionality.
Optional dependencies
Click here to see optional dependencies
clipboard
Enables the pydna.dseqrecord.Dseqrecord.copy_gb_to_clipboard()
and pydna.dseqrecord.Dseqrecord.copy_fasta_to_clipboard()
These methods will put a copy the sequence on the clipboard in either Genbank (gb) or fasta format.
Dependency | Function in pydna |
---|
pyperclip | copy sequence to clipboard |
download
Pyparsing enables the pydna.genbankfixer.gbtext_clean()
function that can automatically
correct malformed sequence files in Genbank format. These are often found online, so this
option also installs requests to enable the pydna.genbankfixer.download.download_text()
function which can be used to get cleaned up text from a URL.
Dependency | Function in pydna |
---|
pyparsing | fix corrupt Genbank files with pydna.genbankfixer |
requests | download sequences with pydna.download |
express
This option enables the pydna.utils.cai()
function and the cai()
method
available from subclasses of pydna.seqrecord.SeqRecord
, such as
pydna.dseqrecord.Dseqrecord
.
| cai2 | codon adaptation index calculations in several modules |
gel
Scipy, matplotlib and pillow (PIL) enable the generation of gel images. Numpy is also
needed, but usually installed as a dependency of biopython.
Installing with pip 🐍
pip install --pre --upgrade pydna
pip install --pre --upgrade pydna[clipboard,download,express,gel]
Remove options inside the square brackets as required, but be sure not to leave spaces as pip will not recognize the options. See below under "Optional dependencies".
Installing with poetry 🧙♂️
If your project uses poetry to manage dependencies, you can install pydna with the following commands:
poetry add pydna
poetry add pydna --extras "clipboard download express gel"
poetry remove pydna
poetry add pydna --extras "express gel"
Contributing and feedback 🛠️
Feedback & suggestions are very welcome! Please create an issue with your question, comment or suggestion. Please include the version of pydna you are using and code to reproduce the issue if possible.
If you don't have a github account, you can get in touch through the google group for pydna.
Below are the instructions for developers who want to contribute to pydna. Please direct pull requests towards the dev_bjorn
branch.
Fork the repository and set up a dev branch 🍴
Fork the entire repository (not just the master
branch by unticking the "Copy the master
branch only" box)
Create your branch starting from dev_bjorn
, and if your changes are related to an issue, call the branch issue_<number>
.
git clone https://github.com/<your-username>/pydna.git
cd pydna
git checkout -b dev_bjorn
git pull origin dev_bjorn
git checkout -b issue_<number>
Local development 💻
Preferred method (using poetry
) 🧙♂️
This is the preferred method to develop on pydna, so if you plan to contribute regularly, it's worth taking this route. If you
encounter any issues setting up the dev environment, create an issue on GitHub and we will be able to help.
Use Poetry to install dependencies and activate virtual environment. This is necessary
if you want to edit the project dependencies. Install poetry using pipx following poetry's installation instructions, do not install it
in the system python or the project environment.
poetry config virtualenvs.in-project true
poetry install --all-extras
poetry shell
poetry run pre-commit install
Alternative method (using pip
) 🐍
Use this for a small contribution or if you don't manage to set up the dev environment.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .
pre-commit install
Contributing code 💻
- Make your changes.
- Add the necessary tests in
tests/
. - Run the tests from the root directory with
python run_test.py
.
TIP: You can run a particular test file with pytest -vs test_file.py
(-v
for verbose and -s
to see print statements in the test). If you want to run just a single test, you can use pytest -vs -k test_name
, where test_name
is the name of the test function.
- Before committing, install
pre-commit
hooks if you haven't by running pre-commit install
. pre-commit
should be available in the environment regardless of the method you use to set up the dev environment.
TIP: The hooks are a series of checks that will be run before you commit your code. If any of the checks fail, the commit will not be allowed. Some of them auto-fix the code (e.g., black
formatting), so you can simply do git add .
and commit again. Others like flake8
will prevent the commit to happen until the code is compliant. For instance, if you import a module in a file and not use it, flake8
will complain. For a full list of checks, see .pre-commit-config.yaml
.
- Push the changes to your fork
Creating a PR 🔗
- From your fork, make a PR towards the branch
dev_bjorn
in the original repository. - Mention the issue number in the PR description (e.g.,
Closes #123
). - Remember to click the "Allow edits from maintainers" checkbox so that we can make some changes to the PR if necessary.
Continuous integration 🤖
The test_and_coverage workflow is triggered on all pushed commits for all branches except the master
branch. This workflow run tests, doctests and a series of Jupyter notebooks using pytest on Linux, Windows and macOS with all
supported python versions.
Building the documentation locally 📚
Documentation is built using Sphinx from docstrings
using a GitHub action.
The numpy docstring format is used.
Below the commands to run a local sphinx server that auto-updated when files are changed.
poetry install --with docs
sphinx-autobuild --watch src/ docs docs/_build/html
More info about how to contribute to the documentation can be found here
Release process 🚀
See the releases for changes and releases.
The build workflow builds a PyPI packages using poetry. This workflow is triggered by publishing a Github release manually from the Github web interface.
History 📜
Pydna was made public in 2012 on Google code.
:microbe:
:portugal:
Who is using pydna? 🧪
Taylor, L. J., & Strebel, K. (2017).
Pyviko: an automated Python tool to design gene knockouts in complex viruses with overlapping genes.
BMC Microbiology, 17(1), 12.
PubMed
Wang, Y., Xue, H., Pourcel, C., Du, Y., & Gautheret, D. (2021).
2-kupl: mapping-free variant detection from DNA-seq data of matched samples.
In Cold Spring Harbor Laboratory (p. 2021.01.17.427048). DOI
PubMed
ShareYourCloning, a web application for designing and documenting DNA cloning strategies.
An Automated Protein Synthesis Pipeline with Transcriptic and Snakemake
and other projects on github