New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

aindex2

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

aindex2

Perfect hash based index for genome data.

  • 1.1.3
  • PyPI
  • Socket score

Maintainers
1

aindex: perfect hash based index for genomic data

PyPI version PyPI pyversions PyPI - Wheel GitHub Actions Workflow Status PyPI license DOI

Installation

Requirements:

jellyfish 2

(easy to install with apt install jellyfish or with conda install bioconda::jellyfish)

Installation with pip:

pip install aindex2

If you want to install the package from source or you don't have pip version for your system, you can do so by running the following commands:

git clone https://github.com/ad3002/aindex.git
cd aindex
make
pip install .

This will create the necessary executables in the bin directory.

To uninstall:

pip uninstall aindex2
pip uninstall clean

To clean up the compiled files, run:

make clean

Mac Compilation Command

Currently unsupported in Makefile. But you can try to compile the Python wrapper on MacOs manually with the following command:

g++ -c -std=c++11 -fPIC python_wrapper.cpp -o python_wrapper.o && g++ -c -std=c++11 -fPIC kmers.cpp kmers.hpp debrujin.cpp debrujin.hpp hash.cpp hash.hpp read.cpp read.hpp settings.hpp settings.cpp && g++ -shared -Wl,-install_name,python_wrapper.so -o python_wrapper.so python_wrapper.o kmers.o debrujin.o hash.o read.o settings.o

Usage

Compute all binary arrays:

FASTQ1=./tests/raw_reads.101bp.IS350bp25_1.fastq
FASTQ2=./tests/raw_reads.101bp.IS350bp25_2.fastq
OUTPUT_PREFIX=./tests/raw_reads.101bp.IS350bp25

compute_aindex.py -i $FASTQ1,$FASTQ2 -t fastq -o $OUTPUT_PREFIX --lu 2 -P 30

Usage from Python

You can simply run demo.py or:

import aindex

prefix_path = "tests/raw_reads.101bp.IS350bp25"
kmer2tf = aindex.get_aindex(prefix_path)

kmer = "A"*23
rkmer = "T"*23
kid = kmer2tf.get_kid_by_kmer(kmer)
print(kmer2tf.get_kmer_info_by_kid(kid))
print(kmer2tf[kmer], kid, kmer2tf.get_kmer_by_kid(kid), len(kmer2tf.pos(kmer)), kmer2tf.get_strand(kmer), kmer2tf.get_strand(rkmer))
kmer = kmer2tf.get_read(0, 23, 0)
pos = kmer2tf.pos(kmer)[0]
print(pos)

print(kmer2tf.get_kid_by_kmer(kmer), kmer2tf.get_kid_by_kmer(rkmer))

print(kmer2tf.get_hash_size())

print(kmer2tf.get_read(0, 123, 0))

print(kmer2tf.get_read(0, 123, 1))


k = 23
for p in kmer2tf.pos(kmer):
  print(kmer2tf.get_read(p, p+k))
  
test_kmer = "TAAGTTATTATTTAGTTAATACT"
right_kmer = "AGTTAATACTTTTAACAATATTA"

print(kmer2tf[kmer])

sequence = kmer2tf.get_read(0, 1023, 0)

print("Task 1. Get kmer frequency")
for i, (kmer, tf) in enumerate(kmer2tf.iter_sequence_kmers(sequence)):
    print(f"Position {i} kmer {kmer} freq = {tf}")
  
print("Task 2. Iter read by read, print the first 20 reads")
for rid, read in kmer2tf.iter_reads():
    if rid == 20:
        break
    print(rid, read)

print("Task 3. Iter reads by kmer, returs (read id, position in read, read, all_positions)")
for rid, pos, read, poses in aindex.iter_reads_by_kmer(test_kmer, kmer2tf):
  print(read[pos:pos+k])


print("Task 4. Iter reads by sequence, returns (read, position in read, read, all_positions ")
sequence = "AATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATA"
for read in aindex.iter_reads_by_sequence(sequence, kmer2tf):
    print(read)

print("Task 5. Iter reads by sequence over hamming distance, returns (read, position in read, read, all_positions, hamming distance). Note that the first kmer used as seed.")
sequence = "AATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATA"
for read in aindex.iter_reads_by_sequence(sequence, kmer2tf, hd=10):
    print(read)

print("Task 6. Iter reads by sequence over hamming distance or edit distance, returns (read, position in read, read, all_positions, hamming distance). Note that the first kmer used as seed")
sequence = "AATATTATTAAGGTATTTAAAAAATACTATTATAGTATTTAACATA"
for read in aindex.iter_reads_by_sequence(sequence, kmer2tf, hd=10):
    print(read)

for read in aindex.iter_reads_by_sequence(sequence, kmer2tf, ed=10):
    print(read)


print("Task 7. Get distances in reads for two kmers, returns a list of (rid, left_kmer_pos, right_kmer_pos) tuples.")
for rid, start, end, length, fragment, is_gapped, is_reversed in aindex.get_left_right_distances(test_kmer, right_kmer, kmer2tf):
    print(rid, start, end, length, fragment, is_gapped, is_reversed)

print("Task 8. Get layout for kmer, returns (max_pos, reads, lefts, rights, rids, starts), for details see source code")
max_pos, reads, lefts, rights, rids, starts = aindex.get_layout_from_reads(right_kmer, kmer2tf)
print("Central layout:")
for read in reads:
    print(read)
print("Left flanks:")
print(lefts)
print("Right flanks:")
print(rights)

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc