Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
An open-source python package licensed under the MIT license, the package represents a python Api wrapper on the UCSC genomic database, which makes it much easier for researchers to access and query the database with an elegant and human readable Api
Expressive Api
Easy to use
Can be extended
Can be reused.
No boilerplate
Install ucsc with pip
pip install ucsc-genomic-api
There are 6 primary classes in the package:
from ucsc.api import Hub, Genome, Track, TrackSchema, Chromosome, Sequence
Each class has the following primary method:
# check documentation for required and optional parameters
className.get() # Returns list of objects of the class
className.find() # Find object by name
className.findBy() # Find object by a specified attribute
className.exists() # Check to see if an object exists
Then you can access the attributes of the object using . notation
className.attributeName # Returns attribute name
List of available hubs as python objects
from ucsc.api import Hub
hubList = Hub.get()
Find hub by name, the function will return the result as an object or throws a not found exception
from ucsc.api import Hub
hub = Hub.find('ALFA Hub')
Find hub by given attribute, the function will return the result as an object or throws a not found exception
from ucsc.api import Hub
hub = Hub.findBy('hubName','ALFA Hub')
Get all genomes from specified hub object
from ucsc.api import Hub
hub = Hub.find('ALFA Hub')
print(hub.genomes) # prints the list of all genomes in the given hub
Get all genomes from all UCSC Database
from ucsc.api import Genome
genomesList = Genome.get()
Find genome by name, the function will return the result as an object or throws a not found exception
from ucsc.api import Genome
genome = Genome.find('ALFA Genome')
Find genome by given attribute, the function will return the result as an object or throws a not found exception
from ucsc.api import Genome
genome = Genome.findBy('genomeName','ALFA Genome')
Check if genome exists in a UCSC database
from ucsc.api import Genome
Genome.exists('hg38')
List the available tracks of the genome object
from ucsc.api import Genome
genome = Genome.find('ALFA Genome')
tracks = genome.tracks
Find a specific track in a genome by name, the return type is an object of track
from ucsc.api import Track
track = Track.find('hg38','knownGene')
Or using a Genome object
from ucsc.api import Genome
genome.findTrack('knownGene')
Find a specific track using a specific attribute, the return type is an object of track
from ucsc.api import Track
track = Track.findBy('hg38','longLabel','ClinGen curation ')
Or using a Genome object
from ucsc.api import Genome
genome.findTrackBy('longLabel','knownGene')
Check if track exists in a genome
from ucsc.api import Track
Track.exists('hg38','knownGene')
Or using a Genome object
from ucsc.api import Genome
genome.isTrackExists('longLabel')
List the schema of specified track from given genome
from ucsc.api import Track
track = Track.find('hg38','knownGene')
trackSchema = track.schema('hg38')
Get track data depends on the parameter you will pass to the trackData function, listed below the possible parameter for each use case
from ucsc.api import Track
track = Track.find('hg38','knownGene') # or you can get the track using the findBy method
# Get track data for specified track in UCSC database genome
track.trackData(genome='hg38',track='gold',maxItemsOutput=100)
# Get track data for specified track and chromosome in UCSC database genome
track.trackData(genome='hg38',track='gold',chrom='chrM')
# Get track data for specified track, chromosome and start,end coordinates in UCSC database genome
track.trackData(genome='hg38',track='gold',chrom='chr1',start=47000,end=48000)
# Get track data for specified track in an assembly hub genome -
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='assembly',hubUrl=hubUrl)
# Get track data for specified track and chromosome in an assembly hub genome
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='assembly',chrom='chr1',hubUrl=hubUrl)
# Get track data for specified track in a track hub -
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='ensGene',hubUrl=hubUrl)
# Get track data for specified track and chromosome in a track hub
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='ensGene',chrom='chr1',hubUrl=hubUrl)
# Download track data for specified track, chromosome with start and end limits in an assembly hub genome -
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.downloadData(genome='CAST_EiJ',track='ensGene',chrom='chr1',hubUrl=hubUrl,start=4321,end=5678)
# Download track data for specified track in a UCSC database genome
track.downloadData(genome='galGal6',track='gc5BaseBw',maxItemsOutput=100)
List chromosomes from UCSC database genome
from ucsc.api import Chromosome
chromosomes = Chromosome.get(genome='hg38')
List chromosomes from specified track in UCSC database genome
from ucsc.api import Chromosome
chromosomes = Chromosome.get(genome='hg38', track='knownGene')
# or
from ucsc.api import Track,Genome
track = Track.find('hg38','knownGene')
genome = Genome.find('ALFA Genome')
chromosomes = Chromosome.get(genome, track)
List chromosomes from assembly hub genome
from ucsc.api import Chromosome
chromosomes = Chromosome.get(hub='ALFA Hub')
List chromosomes from specified track in assembly hub genome # Deprected!
from ucsc.api import Chromosome
chromosomes = Chromosome.get('hg38', 'ALFA Hub','knownGene')
Find Specific chromosome
from ucsc.api import Chromosome
chromosome = Chromosome.find(genome)
Find DNA sequence
The get method in Sequence class accepts multiple parameter, which depends on how do you want to retrieve the sequence object
from ucsc.api import Sequence
# Get DNA sequence from specified chromosome in UCSC database genome -
sequence = Sequence.get(genome = 'hg38',chrom= 'chrM')
print(sequence.dna)
# Get DNA sequence from specified chromosome and start,end coordinates in UCSC database genome -
sequence = Sequence.get(genome= 'hg38',chrom= 'chrM',start=4321,end=5678)
print(sequence.dna)
# Get DNA sequence from a track hub where 'genome' is a UCSC database -
hubUrl = 'http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt';
sequence = Sequence.get(genome= 'mm10',chrom= 'chrM',hubUrl=hubUrl,start=4321,end=5678)
print(sequence.dna)
FAQs
access and query the UCSC database with an elegant and human readable Api
We found that ucsc-genomic-api demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.