🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more →

Book a Demo Install Sign in

sequana-nanomerge

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

sequana-nanomerge

Merge barcoded or non barcoded fastq files generated by Nanopore runs

1.5.0

PyPI

Maintainers: 1

.. image:: https://badge.fury.io/py/sequana-nanomerge.svg :target: https://pypi.python.org/pypi/sequana_nanomerge

.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg :target: http://joss.theoj.org/papers/10.21105/joss.00352 :alt: JOSS (journal of open source software) DOI

.. image:: https://github.com/sequana/nanomerge/actions/workflows/main.yml/badge.svg :target: https://github.com/sequana/nanomerge/actions/workflows

.. image:: https://coveralls.io/repos/github/sequana/nanomerge/badge.svg?branch=main :target: https://coveralls.io/github/sequana/nanomerge?branch=main

.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg :target: http://joss.theoj.org/papers/10.21105/joss.00352 :alt: JOSS (journal of open source software) DOI

.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg :target: https://pypi.python.org/pypi/sequana :alt: Python 3.8 | 3.9 | 3.10

This is is the nanomerge pipeline from the Sequana <https://sequana.readthedocs.org>_ project

:Overview: merge fastq files generated by Nanopore run and generates raw data QC. :Input: individual fastq files generated by nanopore demultiplexing :Output: merged fastq files for each barcode (or unique sample) :Status: production :Citation: Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352

Installation


You can install the packages using pip::

    pip install sequana_nanomerge --upgrade

An optional requirements is pycoQC, which can be install with conda/mamba using e.g.::

    conda install pycoQC

you will also need graphviz installed.

Usage
~~~~~

::

    sequana_nanomerge --help

If you data is barcoded, they are usually in sub-directories barcoded/barcodeXY so you will need to use a pattern
(--input-pattern) such as `*/*.gz`::

    sequana_nanomerge --input-directory DATAPATH/barcoded --samplesheet samplesheet.csv
        --summary summary.txt --input-pattern '*/*fastq.gz'

otherwise all fastq files are in DATAPATH/ so the input pattern can just be `*.fastq.gz`::

    sequana_nanomerge --input-directory DATAPATH --samplesheet samplesheet.csv
        --summary summary.txt --input-pattern '*fastq.gz'

The --summary is optional and takes as input the output of albacore/guppy demultiplexing. usually a file called sequencing_summary.txt

Note that the different between the two is the extra `*/` before the `*.fastq.gz` pattern since barcoded files are in individual subdirectories.

In both bases, the command creates a directory with the pipeline and configuration file. You will then need to execute the pipeline::

    cd nanomerge
    sh nanomerge.sh  # for a local run

This launch a snakemake pipeline. If you are familiar with snakemake, you can 
retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::

    snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt

Or use `sequanix <https://sequana.readthedocs.io/en/master/sequanix.html>`_ interface.

Concerning the sample sheet, whether your data is barcoded or not, it should be a CSV file ::

    barcode,project,sample
    barcode01,main,A
    barcode02,main,B
    barcode03,main,C

For a non-barcoded run, you must provide a file where the barcode column can be set (empty)::

    barcode,project,sample
    ,main,A

or just removed::

    project,sample
    main,A

Usage with apptainer:

With apptainer, initiate the working directory as follows::

sequana_nanomerge --use-apptainer

Images are downloaded in the working directory but you can store then in a directory globally (e.g.)::

sequana_nanomerge --use-apptainer --apptainer-prefix ~/.sequana/apptainers

and then::

cd nanomerge
sh nanomerge.sh

if you decide to use snakemake manually, do not forget to add apptainer options::

snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt --use-apptainer --apptainer-prefix ~/.sequana/apptainers --apptainer-args "-B /home:/home"

Requirements


This pipelines requires the following executable(s), which is optional:

- pycoQC
- dot

.. image:: https://raw.githubusercontent.com/sequana/nanomerge/main/sequana_pipelines/nanomerge/dag.png


Details
~~~~~~~~~

This pipeline runs **nanomerge** in parallel on the input fastq files (paired or not). 
A brief sequana summary report is also produced.


Rules and configuration details

Here is the latest documented configuration file <https://raw.githubusercontent.com/sequana/sequana_nanomerge/master/sequana_pipelines/nanomerge/config.yaml>_ to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.

Changelog


========= ====================================================================
Version   Description
========= ====================================================================
1.5.0     * refactoring to use Click
1.4.0     * sub sampling was biased in v1.3.0. Using stratified sampling to 
            correcly sample large file. Also set a --promethion option that
            auomatically sub sample 10% of the data
          * add summary table
1.3.0     * handle large promethium run by using a sub sample of the 
            sequencing summary file (--sample of pycoQC still loads the entire
            file in memory)
1.2.0     * handle large promethium run by using find+cat instead of just 
            cat to cope with very large number of input files.
1.1.0     * add subsample option and set to 1,000,000 reads to handle large 
            runs such as promethion
1.0.1     * CSV can now handle sample or samplename column name in samplesheet.
          * Fix the pyco file paths, update requirements and doc
1.0.0     Stable release ready for production
0.0.1     **First release.**
========= ====================================================================

Keywords

FAQs

What is sequana-nanomerge?

Is sequana-nanomerge well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

sequana-nanomerge

Keywords

Related posts

Another Wave: North Korean Contagious Interview Campaign Drops 35 New Malicious npm Packages

Malicious Python Package Typosquats Popular passlib Library, Shuts Down Windows Systems