.. image:: https://badge.fury.io/py/sequana-nanomerge.svg
:target: https://pypi.python.org/pypi/sequana_nanomerge
.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
:target: http://joss.theoj.org/papers/10.21105/joss.00352
:alt: JOSS (journal of open source software) DOI
.. image:: https://github.com/sequana/nanomerge/actions/workflows/main.yml/badge.svg
:target: https://github.com/sequana/nanomerge/actions/workflows
.. image:: https://coveralls.io/repos/github/sequana/nanomerge/badge.svg?branch=main
:target: https://coveralls.io/github/sequana/nanomerge?branch=main
.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
:target: http://joss.theoj.org/papers/10.21105/joss.00352
:alt: JOSS (journal of open source software) DOI
.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg
:target: https://pypi.python.org/pypi/sequana
:alt: Python 3.8 | 3.9 | 3.10
This is is the nanomerge pipeline from the Sequana <https://sequana.readthedocs.org>
_ project
:Overview: merge fastq files generated by Nanopore run and generates raw data QC.
:Input: individual fastq files generated by nanopore demultiplexing
:Output: merged fastq files for each barcode (or unique sample)
:Status: production
:Citation: Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352
Installation
You can install the packages using pip::
pip install sequana_nanomerge --upgrade
An optional requirements is pycoQC, which can be install with conda/mamba using e.g.::
conda install pycoQC
you will also need graphviz installed.
Usage
~~~~~
::
sequana_nanomerge --help
If you data is barcoded, they are usually in sub-directories barcoded/barcodeXY so you will need to use a pattern
(--input-pattern) such as `*/*.gz`::
sequana_nanomerge --input-directory DATAPATH/barcoded --samplesheet samplesheet.csv
--summary summary.txt --input-pattern '*/*fastq.gz'
otherwise all fastq files are in DATAPATH/ so the input pattern can just be `*.fastq.gz`::
sequana_nanomerge --input-directory DATAPATH --samplesheet samplesheet.csv
--summary summary.txt --input-pattern '*fastq.gz'
The --summary is optional and takes as input the output of albacore/guppy demultiplexing. usually a file called sequencing_summary.txt
Note that the different between the two is the extra `*/` before the `*.fastq.gz` pattern since barcoded files are in individual subdirectories.
In both bases, the command creates a directory with the pipeline and configuration file. You will then need to execute the pipeline::
cd nanomerge
sh nanomerge.sh # for a local run
This launch a snakemake pipeline. If you are familiar with snakemake, you can
retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::
snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt
Or use `sequanix <https://sequana.readthedocs.io/en/master/sequanix.html>`_ interface.
Concerning the sample sheet, whether your data is barcoded or not, it should be a CSV file ::
barcode,project,sample
barcode01,main,A
barcode02,main,B
barcode03,main,C
For a non-barcoded run, you must provide a file where the barcode column can be set (empty)::
barcode,project,sample
,main,A
or just removed::
project,sample
main,A
Usage with apptainer:
With apptainer, initiate the working directory as follows::
sequana_nanomerge --use-apptainer
Images are downloaded in the working directory but you can store then in a directory globally (e.g.)::
sequana_nanomerge --use-apptainer --apptainer-prefix ~/.sequana/apptainers
and then::
cd nanomerge
sh nanomerge.sh
if you decide to use snakemake manually, do not forget to add apptainer options::
snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt --use-apptainer --apptainer-prefix ~/.sequana/apptainers --apptainer-args "-B /home:/home"
Requirements
This pipelines requires the following executable(s), which is optional:
- pycoQC
- dot
.. image:: https://raw.githubusercontent.com/sequana/nanomerge/main/sequana_pipelines/nanomerge/dag.png
Details
~~~~~~~~~
This pipeline runs **nanomerge** in parallel on the input fastq files (paired or not).
A brief sequana summary report is also produced.
Rules and configuration details
Here is the latest documented configuration file <https://raw.githubusercontent.com/sequana/sequana_nanomerge/master/sequana_pipelines/nanomerge/config.yaml>
_
to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.
Changelog
========= ====================================================================
Version Description
========= ====================================================================
1.5.0 * refactoring to use Click
1.4.0 * sub sampling was biased in v1.3.0. Using stratified sampling to
correcly sample large file. Also set a --promethion option that
auomatically sub sample 10% of the data
* add summary table
1.3.0 * handle large promethium run by using a sub sample of the
sequencing summary file (--sample of pycoQC still loads the entire
file in memory)
1.2.0 * handle large promethium run by using find+cat instead of just
cat to cope with very large number of input files.
1.1.0 * add subsample option and set to 1,000,000 reads to handle large
runs such as promethion
1.0.1 * CSV can now handle sample or samplename column name in samplesheet.
* Fix the pyco file paths, update requirements and doc
1.0.0 Stable release ready for production
0.0.1 **First release.**
========= ====================================================================