Bio Parsers
About this Repo
This repo contains a set of parsers to convert between datatypes through a generalized JSON format.
Exported Functions
Use the following exports to convert to a generalized JSON format:
fastaToJson //handles fasta files (.fa, .fasta)
genbankToJson //handles genbank files (.gb, .gbk)
ab1ToJson //handles .ab1 sequencing read files
sbolXmlToJson //handles .sbol files
geneiousXmlToJson //handles .genious files
jbeiXmlToJson //handles jbei .seq or .xml files
snapgeneToJson //handles snapgene (.dna) files
anyToJson //this handles any of the above file types based on file extension
Use the following exports to convert from a generalized JSON format back to a specific format:
jsonToGenbank
jsonToFasta
jsonToBed
Format Specification
The generalized JSON format looks like:
const generalizedJsonFormat = {
size: 25,
sequence: "asaasdgasdgasdgasdgasgdasgdasdgasdgasgdagasdgasdfasdfdfasdfa",
circular: true,
name: "pBbS8c-RFP",
description: "",
parts: [
{
name: "part 1",
type: "CDS",
id: "092j92",
start: 10,
end: 30,
strand: 1,
notes: {}
}
],
primers: [
{
name: "primer 1",
id: "092j92",
start: 10,
end: 30,
strand: 1,
notes: {}
}
],
features: [
{
name: "anonymous feature",
type: "misc_feature",
id: "5590c1978979df000a4f02c7",
start: 1,
end: 3,
strand: 1,
notes: {}
},
{
name: "coding region 1",
type: "CDS",
id: "5590c1d88979df000a4f02f5",
start: 12,
end: 9,
strand: -1,
notes: {}
}
],
chromatogramData: {
aTrace: [],
tTrace: [],
gTrace: [],
cTrace: [0, 0, 0, 1, 3, 5, 11, 24, 56, 68, 54, 30, 21, 3, 1, 4, 1, 0, 0, ...etc],
basePos: [33, 46, 55, ...etc],
baseCalls: ["A", "T", ...etc],
qualNums: []
}
};
Usage
install
npm install -S @teselagen/bio-parsers
or
yarn add @teselagen/bio-parsers
or
use it from a script tag:
<script src="https://unpkg.com/bio-parsers/umd/bio-parsers.js"></script>
<script>
async function main() {
var jsonOutput = await window.bioParsers.genbankToJson(
`LOCUS kc2 108 bp DNA linear 01-NOV-2016
COMMENT teselagen_unique_id: 581929a7bc6d3e00ac7394e8
FEATURES Location/Qualifiers
CDS 1..108
/label="GFPuv"
misc_feature 61..108
/label="gly_ser_linker"
bogus_dude 4..60
/label="ccmN_sig_pep"
misc_feature 4..60
/label="ccmN_nterm_sig_pep"
/pragma="Teselagen_Part"
/preferred5PrimeOverhangs=""
/preferred3PrimeOverhangs=""
ORIGIN
1 atgaaggtct acggcaagga acagtttttg cggatgcgcc agagcatgtt ccccgatcgc
61 ggtggcagtg gtagcgggag ctcgggtggc tcaggctctg ggg
//`
);
console.log("jsonOutput:", jsonOutput);
var genbankString = window.bioParsers.jsonToGenbank(jsonOutput[0].parsedSequence);
console.log(genbankString);
}
main();
</script>
see the ./umd_demo.html file for a full working example
jsonToGenbank (same interface as jsonToFasta)
import { jsonToGenbank } from "bio-parsers"
const options = {
isProtein: false,
guessIfProtein: false,
guessIfProteinOptions: {
threshold = 0.90,
dnaLetters = ['G', 'A', 'T', 'C']
},
inclusive1BasedStart: false
inclusive1BasedEnd: false
}
const genbankString = jsonToGenbank(generalizedJsonFormat, options)
anyToJson (same interface as genbankToJson, fastaToJson, xxxxToJson) (async required)
import { anyToJson } from "bio-parsers";
const results = await anyToJson(
stringOrFile,
options
);
results[0].success;
results[0].messages;
results[0].parsedSequence;
results[0].parsedSequence.chromatogramData;
Options (for anyToJson or xxxxToJson)
const options = {
fileName: "example.gb",
isProtein: false,
parseFastaAsCircular: false;
inclusive1BasedStart: false
inclusive1BasedEnd: false
acceptParts: true
parseName: true
}
ab1ToJson
import { ab1ToJson } from "bio-parsers";
const results = await ab1ToJson(
file,
options
);
results[0].success;
results[0].messages;
results[0].parsedSequence;
results[0].parsedSequence.chromatogramData;
snapgeneToJson (.dna files)
import { snapgeneToJson } from "bio-parsers";
const results = await snapgeneToJson(file, options);
genbankToJson
import { genbankToJson } from "bio-parsers";
const result = genbankToJson(string, options);
console.info(result);
You can see more examples by looking at the tests.
Updating this repo
Outside collaborators
fork and pull request please :)
Thanks/Collaborators