UCSC-Genomic-REST-Api-Wrapper
An open-source python package licensed under the MIT license, the package represents a python Api wrapper on the UCSC genomic database, which makes it much easier for researchers to access and query the database with an elegant and human readable Api
![MIT License](https://img.shields.io/apm/l/atomic-design-ui.svg?)
About The Package
Project Proposal
Features
-
Expressive Api
-
Easy to use
-
Can be extended
-
Can be reused.
-
No boilerplate
Installation
Install ucsc with pip
pip install ucsc-genomic-api
Documentation
Quick Introduction for busy developers
There are 6 primary classes in the package:
from ucsc.api import Hub, Genome, Track, TrackSchema, Chromosome, Sequence
Each class has the following primary method:
className.get()
className.find()
className.findBy()
className.exists()
Then you can access the attributes of the object using . notation
className.attributeName
Usage guide
List of available hubs as python objects
from ucsc.api import Hub
hubList = Hub.get()
Find hub by name, the function will return the result as an object or throws a not found exception
from ucsc.api import Hub
hub = Hub.find('ALFA Hub')
Find hub by given attribute, the function will return the result as an object or throws a not found exception
from ucsc.api import Hub
hub = Hub.findBy('hubName','ALFA Hub')
Get all genomes from specified hub object
from ucsc.api import Hub
hub = Hub.find('ALFA Hub')
print(hub.genomes)
Get all genomes from all UCSC Database
from ucsc.api import Genome
genomesList = Genome.get()
Find genome by name, the function will return the result as an object or throws a not found exception
from ucsc.api import Genome
genome = Genome.find('ALFA Genome')
Find genome by given attribute, the function will return the result as an object or throws a not found exception
from ucsc.api import Genome
genome = Genome.findBy('genomeName','ALFA Genome')
Check if genome exists in a UCSC database
from ucsc.api import Genome
Genome.exists('hg38')
List the available tracks of the genome object
from ucsc.api import Genome
genome = Genome.find('ALFA Genome')
tracks = genome.tracks
Find a specific track in a genome by name, the return type is an object of track
from ucsc.api import Track
track = Track.find('hg38','knownGene')
Or using a Genome object
from ucsc.api import Genome
genome.findTrack('knownGene')
Find a specific track using a specific attribute, the return type is an object of track
from ucsc.api import Track
track = Track.findBy('hg38','longLabel','ClinGen curation ')
Or using a Genome object
from ucsc.api import Genome
genome.findTrackBy('longLabel','knownGene')
Check if track exists in a genome
from ucsc.api import Track
Track.exists('hg38','knownGene')
Or using a Genome object
from ucsc.api import Genome
genome.isTrackExists('longLabel')
List the schema of specified track from given genome
from ucsc.api import Track
track = Track.find('hg38','knownGene')
trackSchema = track.schema('hg38')
Get track data depends on the parameter you will pass to the trackData function, listed below the possible parameter for each use case
from ucsc.api import Track
track = Track.find('hg38','knownGene')
track.trackData(genome='hg38',track='gold',maxItemsOutput=100)
track.trackData(genome='hg38',track='gold',chrom='chrM')
track.trackData(genome='hg38',track='gold',chrom='chr1',start=47000,end=48000)
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='assembly',hubUrl=hubUrl)
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='assembly',chrom='chr1',hubUrl=hubUrl)
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='ensGene',hubUrl=hubUrl)
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='ensGene',chrom='chr1',hubUrl=hubUrl)
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.downloadData(genome='CAST_EiJ',track='ensGene',chrom='chr1',hubUrl=hubUrl,start=4321,end=5678)
track.downloadData(genome='galGal6',track='gc5BaseBw',maxItemsOutput=100)
List chromosomes from UCSC database genome
from ucsc.api import Chromosome
chromosomes = Chromosome.get(genome='hg38')
List chromosomes from specified track in UCSC database genome
from ucsc.api import Chromosome
chromosomes = Chromosome.get(genome='hg38', track='knownGene')
from ucsc.api import Track,Genome
track = Track.find('hg38','knownGene')
genome = Genome.find('ALFA Genome')
chromosomes = Chromosome.get(genome, track)
List chromosomes from assembly hub genome
from ucsc.api import Chromosome
chromosomes = Chromosome.get(hub='ALFA Hub')
List chromosomes from specified track in assembly hub genome # Deprected!
from ucsc.api import Chromosome
chromosomes = Chromosome.get('hg38', 'ALFA Hub','knownGene')
Find Specific chromosome
from ucsc.api import Chromosome
chromosome = Chromosome.find(genome)
Find DNA sequence
The get method in Sequence class accepts multiple parameter, which depends on how do you want to retrieve the sequence object
from ucsc.api import Sequence
sequence = Sequence.get(genome = 'hg38',chrom= 'chrM')
print(sequence.dna)
sequence = Sequence.get(genome= 'hg38',chrom= 'chrM',start=4321,end=5678)
print(sequence.dna)
hubUrl = 'http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt';
sequence = Sequence.get(genome= 'mm10',chrom= 'chrM',hubUrl=hubUrl,start=4321,end=5678)
print(sequence.dna)