Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

@gmod/gff

Package Overview
Dependencies
Maintainers
6
Versions
10
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@gmod/gff

read and write GFF3 data as streams

  • 1.3.0
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
4.6K
increased by111.52%
Maintainers
6
Weekly downloads
 
Created
Source

@gmod/gff

Build Status

Read and write GFF3 data performantly. This module aims to be a complete implementation of the GFF3 specification.

  • streaming parsing and streaming formatting
  • proper escaping and unescaping of attribute and column values
  • supports features with multiple locations and features with multiple parents
  • reconstructs feature hierarchies of both Parent and Derives_from relationships
  • parses FASTA sections
  • does no validation except for referential integrity of Parent and Derives_from relationships
  • only compatible with GFF3

Install

$ npm install --save @gmod/gff

Usage

const gff = require('@gmod/gff').default
// or in ES6 (recommended)
import gff from '@gmod/gff'

const fs = require('fs')

// parse a file from a file name
// parses only features and sequences by default,
// set options to parse directives and/or comments
fs.createReadStream('path/to/my/file.gff3')
  .pipe(gff.parseStream({ parseAll: true }))
  .on('data', (data) => {
    if (data.directive) {
      console.log('got a directive', data)
    } else if (data.comment) {
      console.log('got a comment', data)
    } else if (data.sequence) {
      console.log('got a sequence from a FASTA section')
    } else {
      console.log('got a feature', data)
    }
  })

// parse a string of gff3 synchronously
const stringOfGFF3 = fs.readFileSync('my_annotations.gff3').toString()
const arrayOfThings = gff.parseStringSync(stringOfGFF3)

// format an array of items to a string
const newStringOfGFF3 = gff.formatSync(arrayOfThings)

// format a stream of things to a stream of text.
// inserts sync marks automatically.
myStreamOfGFF3Objects
  .pipe(gff.formatStream())
  .pipe(fs.createWriteStream('my_new.gff3'))

// format a stream of things and write it to
// a gff3 file. inserts sync marks and a
// '##gff-version 3' header if one is not
// already present
gff.formatFile(
  myStreamOfGFF3Objects,
  fs.createWriteStream('my_new_2.gff3', { encoding: 'utf8' }),
)

Object format

features

In GFF3, features can have more than one location. We parse features as arrayrefs of all the lines that share that feature's ID. Values that are . in the GFF3 are null in the output.

A simple feature that's located in just one place:

[
  {
    "seq_id": "ctg123",
    "source": null,
    "type": "gene",
    "start": 1000,
    "end": 9000,
    "score": null,
    "strand": "+",
    "phase": null,
    "attributes": {
      "ID": [
        "gene00001"
      ],
      "Name": [
        "EDEN"
      ]
    },
    "child_features": [],
    "derived_features": []
  }
]

A CDS called cds00001 located in two places:

[
  {
    "seq_id": "ctg123",
    "source": null,
    "type": "CDS",
    "start": 1201,
    "end": 1500,
    "score": null,
    "strand": "+",
    "phase": "0",
    "attributes": {
      "ID": ["cds00001"],
      "Parent": ["mRNA00001"]
    },
    "child_features": [],
    "derived_features": []
  },
  {
    "seq_id": "ctg123",
    "source": null,
    "type": "CDS",
    "start": 3000,
    "end": 3902,
    "score": null,
    "strand": "+",
    "phase": "0",
    "attributes": {
      "ID": ["cds00001"],
      "Parent": ["mRNA00001"]
    },
    "child_features": [],
    "derived_features": []
  }
]

directives

parseDirective("##gff-version 3\n")
// returns
{
  "directive": "gff-version",
  "value": "3"
}
parseDirective('##sequence-region ctg123 1 1497228\n')
// returns
{
  "directive": "sequence-region",
  "value": "ctg123 1 1497228",
  "seq_id": "ctg123",
  "start": "1",
  "end": "1497228"
}

comments

parseComment('# hi this is a comment\n')
// returns
{
  "comment": "hi this is a comment"
}

sequences

These come from any embedded ##FASTA section in the GFF3 file.

parseSequences(`##FASTA
>ctgA test contig
ACTGACTAGCTAGCATCAGCGTCGTAGCTATTATATTACGGTAGCCA`)
// returns
[
  {
    "id": "ctgA",
    "description": "test contig",
    "sequence": "ACTGACTAGCTAGCATCAGCGTCGTAGCTATTATATTACGGTAGCCA"
  }
]

API

Table of Contents

ParseOptions

Parser options

encoding

Text encoding of the input GFF3. default 'utf8'

Type: BufferEncoding

parseFeatures

Whether to parse features, default true

Type: boolean

parseDirectives

Whether to parse directives, default false

Type: boolean

parseComments

Whether to parse comments, default false

Type: boolean

parseSequences

Whether to parse sequences, default true

Type: boolean

parseAll

Parse all features, directives, comments, and sequences. Overrides other parsing options. Default false.

Type: boolean

bufferSize

Maximum number of GFF3 lines to buffer, default 1000

Type: number

parseStream

Parse a stream of text data into a stream of feature, directive, comment, an sequence objects.

Parameters

Returns GFFTransform stream (in objectMode) of parsed items

parseStringSync

Synchronously parse a string containing GFF3 and return an array of the parsed items.

Parameters
  • str string GFF3 string
  • inputOptions ({encoding: BufferEncoding?, bufferSize: number?} | undefined)? Parsing options

Returns Array<(GFF3Feature | GFF3Sequence)> array of parsed features, directives, comments and/or sequences

formatSync

Format an array of GFF3 items (features,directives,comments) into string of GFF3. Does not insert synchronization (###) marks.

Parameters
  • items Array<GFF3Item> Array of features, directives, comments and/or sequences

Returns string the formatted GFF3

formatStream

Format a stream of features, directives, comments and/or sequences into a stream of GFF3 text.

Inserts synchronization (###) marks automatically.

Parameters
  • options FormatOptions parser options (optional, default {})

Returns FormattingTransform

formatFile

Format a stream of features, directives, comments and/or sequences into a GFF3 file and write it to the filesystem.

Inserts synchronization (###) marks and a ##gff-version directive automatically (if one is not already present).

Parameters
  • stream Readable the stream to write to the file
  • writeStream Writable
  • options FormatOptions parser options (optional, default {})
  • filename the file path to write to

Returns Promise<null> promise for null that resolves when the stream has been written

About util

There is also a util module that contains super-low-level functions for dealing with lines and parts of lines.

// non-ES6
const util = require('@gmod/gff').default.util
// or, with ES6
import gff from '@gmod/gff'
const util = gff.util

const gff3Lines = util.formatItem({
  seq_id: 'ctgA',
  ...
}))

util

Table of Contents

unescape

Unescape a string value used in a GFF3 attribute.

Parameters
  • stringVal string Escaped GFF3 string value

Returns string An unescaped string value

escape

Escape a value for use in a GFF3 attribute value.

Parameters

Returns string An escaped string value

escapeColumn

Escape a value for use in a GFF3 column value.

Parameters

Returns string An escaped column value

parseAttributes

Parse the 9th column (attributes) of a GFF3 feature line.

Parameters
  • attrString string String of GFF3 9th column

Returns GFF3Attributes Parsed attributes

parseFeature

Parse a GFF3 feature line

Parameters

Returns GFF3FeatureLine The parsed feature

parseDirective

Parse a GFF3 directive line.

Parameters
  • line string GFF3 directive line

Returns (GFF3Directive | GFF3SequenceRegionDirective | GFF3GenomeBuildDirective | null) The parsed directive

formatAttributes

Format an attributes object into a string suitable for the 9th column of GFF3.

Parameters

Returns string GFF3 9th column string

formatFeature

Format a feature object or array of feature objects into one or more lines of GFF3.

Parameters

Returns string A string of one or more GFF3 lines

formatDirective

Format a directive into a line of GFF3.

Parameters

Returns string A directive line string

formatComment

Format a comment into a GFF3 comment. Yes I know this is just adding a # and a newline.

Parameters

Returns string A comment line string

formatSequence

Format a sequence object as FASTA

Parameters

Returns string Formatted single FASTA sequence string

formatItem

Format a directive, comment, sequence, or feature, or array of such items, into one or more lines of GFF3.

Parameters

Returns (string | Array<string>) A formatted string or array of strings

GFF3Attributes

A record of GFF3 attribute identifiers and the values of those identifiers

Type: Record<string, (Array<string> | undefined)>

GFF3FeatureLine

A representation of a single line of a GFF3 file

seq_id

The ID of the landmark used to establish the coordinate system for the current feature

Type: (string | null)

source

A free text qualifier intended to describe the algorithm or operating procedure that generated this feature

Type: (string | null)

type

The type of the feature

Type: (string | null)

start

The start coordinates of the feature

Type: (number | null)

end

The end coordinates of the feature

Type: (number | null)

score

The score of the feature

Type: (number | null)

strand

The strand of the feature

Type: (string | null)

phase

For features of type "CDS", the phase indicates where the next codon begins relative to the 5' end of the current CDS feature

Type: (string | null)

attributes

Feature attributes

Type: (GFF3Attributes | null)

GFF3FeatureLineWithRefs

Extends GFF3FeatureLine

A GFF3 Feature line that includes references to other features defined in their "Parent" or "Derives_from" attributes

child_features

An array of child features

Type: Array<GFF3Feature>

derived_features

An array of features derived from this feature

Type: Array<GFF3Feature>

GFF3Feature

A GFF3 feature, which may include multiple individual feature lines

Type: Array<GFF3FeatureLineWithRefs>

GFF3Directive

A GFF3 directive

directive

The name of the directive

Type: string

value

The string value of the directive

Type: string

GFF3SequenceRegionDirective

Extends GFF3Directive

A GFF3 sequence-region directive

value

The string value of the directive

Type: string

seq_id

The sequence ID parsed from the directive

Type: string

start

The sequence start parsed from the directive

Type: string

end

The sequence end parsed from the directive

Type: string

GFF3GenomeBuildDirective

Extends GFF3Directive

A GFF3 genome-build directive

value

The string value of the directive

Type: string

source

The genome build source parsed from the directive

Type: string

buildName

The genome build name parsed from the directive

Type: string

GFF3Comment

A GFF3 comment

comment

The text of the comment

Type: string

GFF3Sequence

A GFF3 FASTA single sequence

id

The ID of the sequence

Type: string

description

The description of the sequence

Type: string

sequence

The sequence

Type: string

License

MIT © Robert Buels

Keywords

FAQs

Package last updated on 06 Dec 2022

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc