Binary-parser
Binary-parser is a parser builder for JavaScript that enables you to write
efficient binary parsers in a simple and declarative manner.
It supports all common data types required to analyze a structured binary
data. Binary-parser dynamically generates and compiles the parser code
on-the-fly, which runs as fast as a hand-written parser (which takes much more
time and effort to write). Supported data types are:
- Integers (8, 16, 32 and 64 bit signed
and unsigned integers)
- Floating point numbers (32 and 64 bit
floating point values)
- Bit fields (bit fields with length from 1 to 32
bits)
- Strings (fixed-length, variable-length and zero
terminated strings with various encodings)
- Arrays (fixed-length and variable-length arrays of
builtin or user-defined element types)
- Choices (supports integer keys)
- Pointers
- User defined types (arbitrary combination of builtin types)
Binary-parser was inspired by BinData
and binary.
Quick Start
- Create an empty Parser object with
new Parser()
or Parser.start()
. - Chain methods to build your desired parser. (See
API for detailed document of
each method)
- Call
Parser.prototype.parse
with a Buffer
/Uint8Array
object passed as
an argument. - The parsed result will be returned as an object.
const Parser = require("binary-parser").Parser;
const ipHeader = new Parser()
.endianness("big")
.bit4("version")
.bit4("headerLength")
.uint8("tos")
.uint16("packetLength")
.uint16("id")
.bit3("offset")
.bit13("fragOffset")
.uint8("ttl")
.uint8("protocol")
.uint16("checksum")
.array("src", {
type: "uint8",
length: 4
})
.array("dst", {
type: "uint8",
length: 4
});
const buf = Buffer.from("450002c5939900002c06ef98adc24f6c850186d1", "hex");
console.log(ipHeader.parse(buf));
Installation
You can install binary-parser
via npm:
npm install binary-parser
The npm package provides entry points for both CommonJS and ES modules.
API
new Parser()
Create an empty parser object that parses nothing.
parse(buffer)
Parse a Buffer
/Uint8Array
object buffer
with this parser and return the
resulting object. When parse(buffer)
is called for the first time, the
associated parser code is compiled on-the-fly and internally cached.
create(constructorFunction)
Set the constructor function that should be called to create the object
returned from the parse
method.
[u]int{8, 16, 32, 64}{le, be}(name[, options])
Parse bytes as an integer and store it in a variable named name
. name
should consist only of alphanumeric characters and start with an alphabet.
Number of bits can be chosen from 8, 16, 32 and 64. Byte-ordering can be either
l
for little endian or b
for big endian. With no prefix, it parses as a
signed number, with u
prefixed as an unsigned number. The runtime type
returned by the 8, 16, 32 bit methods is number
while the type
returned by the 64 bit is bigint
.
Note: [u]int64{be,le} methods only work if your runtime is node v12.0.0 or
greater. Lower version will throw a runtime error.
const parser = new Parser()
.int32le("a")
.uint8("b")
.int16be("c");
.int64be("d")
bit[1-32](name[, options])
Parse bytes as a bit field and store it in variable name
. There are 32
methods from bit1
to bit32
each corresponding to 1-bit-length to
32-bits-length bit field.
{float, double}{le, be}(name[, options])
Parse bytes as a floating-point value and stores it to a variable named
name
.
const parser = new Parser()
.floatbe("a")
.doublele("b");
string(name[, options])
Parse bytes as a string. name
should consist only of alpha numeric
characters and start with an alphabet. options
is an object which can have
the following keys:
encoding
- (Optional, defaults to utf8
) Specify which encoding to use.
Supported encodings include "hex"
and all encodings supported by
TextDecoder
.length
- (Optional) Length of the string. Can be a number, string or a
function. Use number for statically sized arrays, string to reference
another variable and function to do some calculation.zeroTerminated
- (Optional, defaults to false
) If true, then this parser
reads until it reaches zero.greedy
- (Optional, defaults to false
) If true, then this parser reads
until it reaches the end of the buffer. Will consume zero-bytes.stripNull
- (Optional, must be used with length
) If true, then strip
null characters from end of the string
buffer(name[, options])
Parse bytes as a buffer. Its type will be the same as the input to
parse(buffer)
. name
should consist only of alpha numeric characters and
start with an alphabet. options
is an object which can have the following
keys:
clone
- (Optional, defaults to false
) By default,
buffer(name [,options])
returns a new buffer which references the same
memory as the parser input, but offset and cropped by a certain range. If
this option is true, input buffer will be cloned and a new buffer
referencing a new memory region is returned.length
- (either length
or readUntil
is required) Length of the
buffer. Can be a number, string or a function. Use number for statically
sized buffers, string to reference another variable and function to do some
calculation.readUntil
- (either length
or readUntil
is required) If "eof"
, then
this parser will read till it reaches the end of the Buffer
/Uint8Array
object. If it is a function, this parser will read the buffer until the
function returns true.
array(name, options)
Parse bytes as an array. options
is an object which can have the following
keys:
type
- (Required) Type of the array element. Can be a string or an user
defined Parser object. If it's a string, you have to choose from [u]int{8,
16, 32}{le, be}.length
- (either length
, lengthInBytes
, or readUntil
is required)
Length of the array. Can be a number, string or a function. Use number for
statically sized arrays.lengthInBytes
- (either length
, lengthInBytes
, or readUntil
is
required) Length of the array expressed in bytes. Can be a number, string or
a function. Use number for statically sized arrays.readUntil
- (either length
, lengthInBytes
, or readUntil
is required)
If "eof"
, then this parser reads until the end of the Buffer
/Uint8Array
object. If function it reads until the function returns true.
const parser = new Parser()
.array("data", {
type: "int32",
length: 8
})
.uint8("dataLength")
.array("data2", {
type: "int32",
length: "dataLength"
})
.array("data3", {
type: "int32",
length: function() {
return this.dataLength - 1;
}
})
.array("data4", {
type: "int32",
lengthInBytes: 16
})
.uint8("dataLengthInBytes")
.array("data5", {
type: "int32",
lengthInBytes: "dataLengthInBytes"
})
.array("data6", {
type: "int32",
lengthInBytes: function() {
return this.dataLengthInBytes - 4;
}
})
.array("data7", {
type: "int32",
readUntil: function(item, buffer) {
return item === 42;
}
})
.array("data8", {
type: userDefinedParser,
length: "dataLength"
});
choice([name,] options)
Choose one parser from multiple parsers according to a field value and store
its parsed result to key name
. If name
is null or omitted, the result of
the chosen parser is directly embedded into the current object. options
is
an object which can have the following keys:
tag
- (Required) The value used to determine which parser to use from the
choices
Can be a string pointing to another field or a function.choices
- (Required) An object which key is an integer and value is the
parser which is executed when tag
equals the key value.defaultChoice
- (Optional) In case if the tag value doesn't match any of
choices
, this parser is used.
const parser1 = ...;
const parser2 = ...;
const parser3 = ...;
const parser = new Parser().uint8("tagValue").choice("data", {
tag: "tagValue",
choices: {
1: parser1,
4: parser2,
5: parser3
}
});
Combining choice
with array
is an idiom to parse
TLV-based binary formats.
nest([name,] options)
Execute an inner parser and store its result to key name
. If name
is null
or omitted, the result of the inner parser is directly embedded into the
current object. options
is an object which can have the following keys:
type
- (Required) A Parser
object.
pointer(name [,options])
Jump to offset
, execute parser for type
and rewind to previous offset.
Useful for parsing binary formats such as ELF where the offset of a field is
pointed by another field.
type
- (Required) Can be a string [u]int{8, 16, 32, 64}{le, be}
or an user defined Parser object.offset
- (Required) Indicates absolute offset from the beginning of the
input buffer. Can be a number, string or a function.
saveOffset(name [,options])
Save the current buffer offset as key name
. This function is only useful
when called after another function which would advance the internal buffer
offset.
const parser = new Parser()
.string("name", {
zeroTerminated: true
})
.uint32("seekOffset")
.saveOffset("currentOffset")
.seek(function() {
return this.seekOffset - this.currentOffset;
})
...
seek(relOffset)
Move the buffer offset for relOffset
bytes from the current position. Use a
negative relOffset
value to rewind the offset. This method was previously
named skip(length)
.
endianness(endianness)
Define what endianness to use in this parser. endianness
can be either
"little"
or "big"
. The default endianness of Parser
is set to big-endian.
const parser = new Parser()
.endianness("little")
.uint16be("a")
.uint32le("a")
.uint16("b")
.int32("c");
namely(alias)
Set an alias to this parser, so there will be an opportunity to refer to it by
name in methods like .array
, .nest
and .choice
, instead of requirement
to have an instance of it.
Especially, the parser may reference itself:
const stop = new Parser();
const parser = new Parser()
.namely("self")
.uint8("type")
.choice("data", {
tag: "type",
choices: {
0: stop,
1: "self",
2: Parser.start()
.nest("left", { type: "self" })
.nest("right", { type: "self" }),
3: Parser.start()
.nest("one", { type: "self" })
.nest("two", { type: "self" })
.nest("three", { type: "self" })
}
});
const buffer = Buffer.from([
2,
3,
1, 0,
0,
2,
1, 0,
0,
1, 0
]);
parser.parse(buffer);
For most of the cases there is almost no difference to the instance-way of
referencing, but this method provides the way to parse recursive trees, where
each node could reference the node of the same type from the inside.
Also, when you reference a parser using its instance twice, the generated code
will contain two similar parts of the code included, while with the named
approach, it will include a function with a name, and will just call this
function for every case of usage.
Note: This style could lead to circular references and infinite recursion,
to avoid this, ensure that every possible path has its end. Also, this
recursion is not tail-optimized, so could lead to memory leaks when it goes
too deep.
An example of referencing other patches:
const parser = Parser.start().namely("self");
const stop = Parser.start().namely("stop");
const twoCells = Parser.start()
.namely("twoCells")
.nest("left", { type: "self" })
.nest("right", { type: "stop" });
parser.uint8("type").choice("data", {
tag: "type",
choices: {
0: "stop",
1: "self",
2: "twoCells"
}
});
const buffer = Buffer.from([2, 1, 1, 0, 0]);
parser.parse(buffer);
wrapped(name[, options])
Read data then wrap it by transforming it by a function for further parsing.
It works similarly to a buffer where it reads a block of data. But instead of returning the buffer it
will pass it on to a parser for further processing.
wrapper
- (Required) A function taking a buffer and returning a buffer ((x: Buffer | Uint8Array ) => Buffer | Uint8Array
)
transforming the buffer into a buffer expected by type
.type
- (Required) A Parser
object to parse the result of wrapper.length
- (either length
or readUntil
is required) Length of the
buffer. Can be a number, string or a function. Use number for statically
sized buffers, string to reference another variable and function to do some
calculation.readUntil
- (either length
or readUntil
is required) If "eof"
, then
this parser will read till it reaches the end of the Buffer
/Uint8Array
object. If it is a function, this parser will read the buffer until the
function returns true.
const zlib = require("zlib");
const textParser = Parser.start()
.string("text", {
zeroTerminated: true,
});
const mainParser = Parser.start()
.uint32le("length")
.wrapped("wrappedData", {
length: "length",
wrapper: function (buffer) {
return zlib.inflateRawSync(buffer);
},
type: textParser,
});
mainParser.parse(buffer);
sizeOf()
Returns how many bytes this parser consumes. If the size of the parser cannot
be statically determined, a NaN
is returned.
compile()
Compile this parser on-the-fly and cache its result. Usually, there is no need
to call this method directly, since it's called when parse(buffer)
is
executed for the first time.
getCode()
Dynamically generates the code for this parser and returns it as a string.
Useful for debugging the generated code.
Common options
These options can be used in all parsers.
-
formatter
- Function that transforms the parsed value into a more desired
form.
const parser = new Parser().array("ipv4", {
type: uint8,
length: "4",
formatter: function(arr) {
return arr.join(".");
}
});
-
assert
- Do assertion on the parsed result (useful for checking magic
numbers and so on). If assert
is a string
or number
, the actual parsed
result will be compared with it with ===
(strict equality check), and an
exception is thrown if they mismatch. On the other hand, if assert
is a
function, that function is executed with one argument (parsed result) and if
it returns false, an exception is thrown.
const ClassFile = Parser.start()
.endianness("big")
.uint32("magic", { assert: 0xcafebabe });
const parser = new Parser()
.int16le("a")
.int16le("b")
.int16le("c", {
assert: function(x) {
return this.a + this.b === x;
}
});
Context variables
You can use some special fields while parsing to traverse your structure.
These context variables will be removed after the parsing process.
Note that this feature is turned off by default for performance reasons, and
you need to call .useContextVars()
to enable it.
-
$parent
- This field references the parent structure. This variable will be
null
while parsing the root structure.
var parser = new Parser()
.useContextVars()
.nest("header", {
type: new Parser().uint32("length"),
})
.array("data", {
type: "int32",
length: function() {
return this.$parent.header.length
}
});
-
$root
- This field references the root structure.
const parser = new Parser()
.useContextVars()
.nest("header", {
type: new Parser().uint32("length"),
})
.nest("data", {
type: new Parser()
.uint32("value")
.array("data", {
type: "int32",
length: function() {
return this.$root.header.length
}
}),
});
-
$index
- This field references the actual index in array parsing. This
variable will be available only when using the length
mode for arrays.
const parser = new Parser()
.useContextVars()
.nest("header", {
type: new Parser().uint32("length"),
})
.nest("data", {
type: new Parser()
.uint32("value")
.array("data", {
type: new Parser().nest({
type: new Parser().uint8("_tmp"),
formatter: function(item) {
return this.$index % 2 === 0 ? item._tmp : String.fromCharCode(item._tmp);
}
}),
length: "$root.header.length"
}),
});
Examples
See example/
for real-world examples.
Benchmarks
A benchmark script to compare the parsing performance with binparse, structron
and destruct.js is available under benchmark/
.
Contributing
Please report issues to the
issue tracker if you have
any difficulties using this module, found a bug, or request a new feature.
Pull requests are welcomed.