
Security News
GitHub Actions Checkout Now Blocks Risky pull_request_target Checkouts
GitHub Actions checkout now blocks risky pull_request_target checkouts by default to help prevent pwn request supply chain attacks.
High performance streaming Variant Call Format (VCF) parser in pure JavaScript
VCF (variant call format) parser
import { TabixIndexedFile } from '@gmod/tabix'
import VCF from '@gmod/vcf'
const tbiIndexed = new TabixIndexedFile({ path: '/path/to/my.vcf.gz' })
const headerText = await tbiIndexed.getHeader()
const parser = new VCF({ header: headerText }) // strict?: boolean (default true)
const variants = []
await tbiIndexed.getLines('ctgA', 200, 300, line =>
variants.push(parser.parseLine(line)),
)
parseLine(line) returns a Variant with these fields:
{
CHROM: 'contigA',
POS: 3000,
ID: ['rs17883296'],
REF: 'G',
ALT: ['T', 'A'],
QUAL: 100,
FILTER: 'PASS', // 'PASS' | string[] of filter names | undefined if '.'
INFO: {
NS: [3],
DP: [14],
AF: [0.5],
DB: true, // Flag type
XYZ: ['5'], // unknown fields default to Number=1, Type=String
},
}
INFO and FORMAT values are typed using header metadata. Values are arrays unless
Type=Flag, in which case they are true. Fields defined in the
VCF spec are typed even
without a header entry.
variant.SAMPLES() — full sample data with all FORMAT fields parsedvariant.GENOTYPES() — GT strings only (faster)variant.processGenotypes(callback) — iterate genotypes without allocating
strings (fastest)let homRef = 0
variant.processGenotypes((str, start, end, sampleIdx) => {
if (
end - start === 3 && // e.g. "0|0"
str.charCodeAt(start) === 48 && // 48 = '0'
str.charCodeAt(start + 2) === 48
) {
homRef++
}
})
Sample data is lazily parsed — nothing is computed until these methods are called.
parser.getMetadata(...keys) returns header metadata, filtered by the keys
provided:
parser.getMetadata('INFO', 'DP')
// { Number: 1, Type: 'Integer', Description: 'Total Depth' }
parser.getMetadata('INFO', 'DP', 'Number')
// 1
Call with no arguments to get all metadata. parser.samples lists sample names.
To parse a plain VCF without tabix, collect header lines until the first non-header line, then construct the parser:
import fs from 'fs'
import VCF from '@gmod/vcf'
import { createGunzip } from 'zlib'
import readline from 'readline'
const rl = readline.createInterface({
input: fs.createReadStream('file.vcf.gz').pipe(createGunzip()),
})
const header = []
let parser
rl.on('line', line => {
if (line.startsWith('#')) {
header.push(line)
} else {
if (!parser) {
parser = new VCF({ header: header.join('\n') })
}
const variant = parser.parseLine(line)
console.log(variant.CHROM, variant.POS)
}
})
parseBreakend(alt) parses a breakend ALT string:
import { parseBreakend } from '@gmod/vcf'
parseBreakend('C[2:321682[')
// { MateDirection: 'right', Replacement: 'C', MatePosition: '2:321682', Join: 'right' }
All four bracket forms from the VCF spec:
| ALT form | Join | MateDirection |
|---|---|---|
t[p[ | right | right |
t]p] | right | left |
[p[t | left | right |
]p]t | left | left |
Join — whether the replacement base appears before (right) or after
(left) the mate positionMateDirection — [ means the mate sequence extends rightward; ] means
leftwardWhen the ALT starts or ends with ., parseBreakend returns
SingleBreakend: true with no MatePosition:
parseBreakend('C.')
// { Join: 'right', Replacement: 'C', SingleBreakend: true }
parseBreakend('.ACGT')
// { Join: 'left', Replacement: 'ACGT', SingleBreakend: true }
Trusted publishing via GitHub Actions.
pnpm version patch # or minor/major
FAQs
High performance streaming Variant Call Format (VCF) parser in pure JavaScript
The npm package @gmod/vcf receives a total of 2,479 weekly downloads. As such, @gmod/vcf popularity was classified as popular.
We found that @gmod/vcf demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
GitHub Actions checkout now blocks risky pull_request_target checkouts by default to help prevent pwn request supply chain attacks.

Product
Socket now supports Custom Roles and Repository Access Permissions so organizations can control who can access specific repositories and actions.

Product
Socket MCP now lets AI assistants review org alerts, investigate threats using the Socket threat feed, and inspect package files in addition to dependency scoring.