@gmod/cram
Read CRAM files (indexed or unindexed) with pure JS, works in node or in the
browser.
- Reads CRAM 3.x and 2.x (3.1 added in v1.6.0)
- Does not read CRAM 1.x
- Can use .crai indexes out of the box, for efficient sequence fetching, but
also has an index API that would allow use with other index
types
- Has preliminary support for bzip2 and lzma codecs. lzma requires the latest
@gmod/cram version, and uses webassembly. If you find you are unable to
compile it, you can try downgrading
Install
$ npm install --save @gmod/cram
$ yarn add @gmod/cram
Usage
const { IndexedCramFile, CramFile, CraiIndex } = require('@gmod/cram')
const { IndexedFasta, BgzipIndexedFasta } = require('@gmod/indexedfasta')
const t = new IndexedFasta({
path: '/filesystem/yourfile.fa',
faiPath: '/filesystem/yourfile.fa.fai',
})
run = async () => {
const idToName = []
const nameToId = {}
const indexedFile = new IndexedCramFile({
cramPath: '/filesystem/yourfile.cram',
index: new CraiIndex({
path: '/filesystem/yourfile.cram.crai',
}),
seqFetch: async (seqId, start, end) => {
return t.getSequence(idToName[seqId], start - 1, end)
},
checkSequenceMD5: false,
})
const samHeader = await indexedFile.cram.getSamHeader()
const sqLines = samHeader.filter(l => l.tag === 'SQ')
sqLines.forEach((sqLine, refId) => {
sqLine.data.forEach(item => {
if (item.tag === 'SN') {
const refName = item.value
nameToId[refName] = refId
idToName[refId] = refName
}
})
})
const records = await indexedFile.getRecordsForRange(
nameToId['chr1'],
10000,
20000,
)
records.forEach(record => {
console.log(`got a record named ${record.readName}`)
if (record.readFeatures != undefined) {
record.readFeatures.forEach(({ code, pos, refPos, ref, sub }) => {
if (code === 'X') {
console.log(
`${record.readName} shows a base substitution of ${ref}->${sub} at ${refPos}`,
)
}
})
}
})
}
run()
You can use cram-js without NPM also with the cram-bundle.js. See the example
directory for usage with script tag
API (auto-generated)
CramRecord
Table of Contents
CramRecord
Class of each CRAM record returned by this API.
isPaired
Returns
boolean
true if the read is paired, regardless of whether both segments are mapped
isProperlyPaired
Returns
boolean
true if the read is paired, and both segments are mapped
isSegmentUnmapped
Returns
boolean
true if the read itself is unmapped; conflictive with isProperlyPaired
isMateUnmapped
Returns
boolean
true if the read itself is unmapped; conflictive with isProperlyPaired
isReverseComplemented
Returns
boolean
true if the read is mapped to the reverse strand
isMateReverseComplemented
Returns
boolean
true if the mate is mapped to the reverse strand
isRead1
Returns
boolean
true if this is read number 1 in a pair
isRead2
Returns
boolean
true if this is read number 2 in a pair
isSecondary
Returns
boolean
true if this is a secondary alignment
isFailedQc
Returns
boolean
true if this read has failed QC checks
isDuplicate
Returns
boolean
true if the read is an optical or PCR duplicate
isSupplementary
Returns
boolean
true if this is a supplementary alignment
isDetached
Returns
boolean
true if the read is detached
hasMateDownStream
Returns
boolean
true if the read has a mate in this same CRAM segment
isPreservingQualityScores
Returns
boolean
true if the read contains qual scores
isUnknownBases
Returns
boolean
true if the read has no sequence bases
getReadBases
Get the original sequence of this read.
Returns
String
sequence basepairs
getPairOrientation
Get the pair orientation of a paired read. Adapted from igv.js
Returns
String
of paired orientatin
addReferenceSequence
Annotates this feature with the given reference sequence basepair information.
This will add a sub
and a ref
item to base subsitution read features given
the actual substituted and reference base pairs, and will make the
getReadSequence()
method work.
Parameters
Returns
undefined
nothing
ReadFeatures
The feature objects appearing in the readFeatures
member of CramRecord objects
that show insertions, deletions, substitutions, etc.
Static fields
- code (
character
): One of "bqBXIDiQNSPH". See page 15 of the CRAM v3 spec
for their meanings. - data (
any
): the data associated with the feature. The format of this
varies depending on the feature code. - pos (
number
): location relative to the read (1-based) - refPos (
number
): location relative to the reference (1-based)
IndexedCramFile
Table of Contents
constructor
Parameters
-
args
object
args.cram
CramFileargs.index
Index-like object that supports
getEntriesForRange(seqId,start,end) -> Promise[Array[index entries]]args.cacheSize
number?
optional maximum number of CRAM records to cache. default 20,000args.fetchSizeLimit
number?
optional maximum number of bytes to fetch in a single getRecordsForRange
call. Default 3 MiB.args.checkSequenceMD5
boolean?
default true. if false, disables verifying the MD5 checksum of the reference
sequence underlying a slice. In some applications, this check can cause an
inconvenient amount (many megabases) of sequences to be fetched.
getRecordsForRange
Parameters
seq
number
numeric ID of the reference sequencestart
number
start of the range of interest. 1-based closed coordinates.end
number
end of the range of interest. 1-based closed coordinates.opts
(optional, default {}
)
hasDataForReferenceSequence
Parameters
Returns
Promise
true if the CRAM file contains data for the given reference sequence numerical
ID
CramFile
Table of Contents
constructor
Parameters
-
args
object
args.filehandle
object?
a filehandle that implements the stat() and read() methods of the Node
filehandle API https://nodejs.org/api/fs.html#fs_class_filehandleargs.path
object?
path to the cram fileargs.url
object?
url for the cram file. also supports file:// urls for local filesargs.seqFetch
function?
a function with signature (seqId, startCoordinate, endCoordinate)
that
returns a promise for a string of sequence basesargs.cacheSize
number?
optional maximum number of CRAM records to cache. default 20,000args.checkSequenceMD5
boolean?
default true. if false, disables verifying the MD5 checksum of the reference
sequence underlying a slice. In some applications, this check can cause an
inconvenient amount (many megabases) of sequences to be fetched.
containerCount
CraiIndex
Table of Contents
constructor
Parameters
hasDataForReferenceSequence
Parameters
Returns
Promise
true if the index contains entries for the given reference sequence ID, false
otherwise
getEntriesForRange
fetch index entries for the given range
Parameters
Returns
Promise
promise for an array of objects of the form
{start, span, containerStart, sliceStart, sliceBytes }
CramUnimplementedError
Extends Error
Error caused by encountering a part of the CRAM spec that has not yet been
implemented
CramMalformedError
Extends CramError
An error caused by malformed data.
CramBufferOverrunError
Extends CramMalformedError
An error caused by attempting to read beyond the end of the defined data.
Academic Use
This package was written with funding from the NHGRI as
part of the JBrowse project. If you use it in an academic
project that you publish, please cite the most recent JBrowse paper, which will
be linked from jbrowse.org.
License
MIT © Robert Buels