Augmented Interval List
![Coffee](https://img.shields.io/badge/-buy_me_a%C2%A0coffee-gray?logo=buy-me-a-coffee&color=ff69b4)
Augmented interval list (AIList) is a data structure for enumerating intersections
between a query interval and an interval set. AILists have previously been shown
to be faster than interval tree, NCList, and BEDTools.
This implementation is a Python wrapper of the one used in the original AIList library.
Additonal wrapper functions have been created which allow easy user interface.
All citations should reference to original paper.
For full usage and installation documentation
Install
If you dont already have numpy and scipy installed, it is best to download
Anaconda
, a python distribution that has them included.
https://continuum.io/downloads
Dependencies can be installed by:
pip install -r requirements.txt
PyPI install, presuming you have all its requirements installed:
pip install ailist
Benchmark
Test numpy random integers:
from ailist import AIList
from ncls import NCLS
import numpy as np
import pandas as pd
import quicksect
np.random.seed(100)
starts1 = np.random.randint(0, 100000, 100000)
ends1 = starts1 + np.random.randint(1, 10000, 100000)
ids1 = np.arange(len(starts1))
values1 = np.ones(len(starts1))
starts2 = np.random.randint(0, 100000, 100000)
ends2 = starts2 + np.random.randint(1, 10000, 100000)
ids2 = np.arange(len(starts2))
values2 = np.ones(len(starts2))
Library | Function | Time (µs) |
---|
ncls | single overlap | 1170 |
pandas | single overlap | 924 |
quicksect | single overlap | 550 |
ailist | single overlap | 73 |
Library | Function | Time (s) | Max Memory (GB) |
---|
ncls | bulk overlap | 151 s | >50 |
ailist | bulk overlap | 17.8 s | ~9 |
Usage
from ailist import AIList
import numpy as np
i = AIList()
i.add(15, 20)
i.add(10, 30)
i.add(17, 19)
i.add(5, 20)
i.add(12, 15)
i.add(30, 40)
i.display()
o = i.intersect(6, 15)
o.display()
i.intersect_index(6, 15)
i.display()
i.construct()
for x in i:
print(x)
j = AIList()
j.add(5, 15)
j.add(50, 60)
s = i - j
s.display()
i + j
starts = np.arange(10,1000,100)
ends = starts + 50
ids = starts
values = np.ones(10)
i.from_array(starts, ends, ids, values)
i.display()
m = i.merge(gap=10)
m.display()
c = i.coverage()
c.head()
w = i.wps(5)
w.head()
fi = i.filter(3,20)
fi.display()
i.intersect_from_array(starts, ends, ids)
Original paper
Jianglin Feng, Aakrosh Ratan, Nathan C Sheffield; Augmented Interval List: a novel data structure for efficient genomic interval search, Bioinformatics, btz407, https://doi.org/10.1093/bioinformatics/btz407