Binary-parser-encoder
Note: This is a fork of binary-parser
library. It is currently being proposed as a Pull-Request in that project.
Until the encoding feature is merged in baseline of original project,
this branch is published under the name: binary-parser-encoder in npm.

Binary-parser is a binary parser/encoder builder for node that
enables you to write efficient parsers/encoders in a simple and declarative manner.
It supports all common data types required to analyze a structured binary
data. Binary-parser dynamically generates and compiles the parser and encoder code
on-the-fly, which runs as fast as a hand-written parser/encoder (which takes much more
time and effort to write). Supported data types are:
- Integers (supports 8, 16, 32 bit signed- and unsigned integers)
- Floating point numbers (supports 32 and 64 bit floating point values)
- Bit fields (supports bit fields with length from 1 to 32 bits)
- Strings (supports various encodings, fixed-length and variable-length, zero
terminated string)
- Arrays (supports user-defined element type, fixed-length and variable-length)
- Choices
- User defined types
This library's features are inspired by BinData
, its syntax by binary.
Installation
Binary-parser can be installed with npm:
$ npm install binary-parser
Quick Start
- Create an empty Parser object with
new Parser()
.
- Chain builder methods to build the desired parser and/or encoder. (See
API for detailed document of
each methods)
- Call
Parser.prototype.parse
with an Buffer
object passed as argument.
- Parsed result will be returned as an object.
- Or call
Parser.prototype.encode
with an object passed as argument.
- Encoded result will be returned as a
Buffer
object.
var Parser = require("binary-parser").Parser;
var ipHeader = new Parser()
.endianess("big")
.bit4("version")
.bit4("headerLength")
.uint8("tos")
.uint16("packetLength")
.uint16("id")
.bit3("offset")
.bit13("fragOffset")
.uint8("ttl")
.uint8("protocol")
.uint16("checksum")
.array("src", {
type: "uint8",
length: 4
})
.array("dst", {
type: "uint8",
length: 4
});
var buf = Buffer.from("450002c5939900002c06ef98adc24f6c850186d1", "hex");
console.log(ipHeader.parse(buf));
var anIpHeader = {
version: 4,
headerLength: 5,
tos: 0,
packetLength: 709,
id: 37785,
offset: 0,
fragOffset: 0,
ttl: 44,
protocol: 6,
checksum: 61336,
src: [ 173, 194, 79, 108 ],
dst: [ 133, 1, 134, 209 ] };
console.log(ipHeader.encode(anIpHeader).toString("hex"));
API
new Parser([options])
Constructs a Parser object. Returned object represents a parser which parses
nothing. options
is an optional object to pass options to this declarative
parser.
smartBufferSize
The chunk size of the encoding (smart)buffer (when encoding is used) (default is 256 bytes).
parse(buffer)
Parse a Buffer
object buffer
with this parser and return the resulting
object. When parse(buffer)
is called for the first time, parser code is
compiled on-the-fly and internally cached.
encode(obj)
Encode an Object
object obj
with this parser and return the resulting
Buffer
. When encode(obj)
is called for the first time, encoder code is
compiled on-the-fly and internally cached.
create(constructorFunction)
Set the constructor function that should be called to create the object
returned from the parse
method.
[u]int{8, 16, 32}{le, be}(name[, options])
Parse bytes as an integer and store it in a variable named name
. name
should consist only of alphanumeric characters and start with an alphabet.
Number of bits can be chosen from 8, 16 and 32. Byte-ordering can be either
l
for little endian or b
for big endian. With no prefix, it parses as a
signed number, with u
prefixed as an unsigned number.
var parser = new Parser()
.int32le("a")
.uint8("b")
.int16be("c");
bit[1-32](name[, options])
Parse bytes as a bit field and store it in variable name
. There are 32
methods from bit1
to bit32
each corresponding to 1-bit-length to
32-bits-length bit field.
{float, double}{le, be}(name[, options])
Parse bytes as an floating-point value and store it in a variable named
name
. name
should consist only of alphanumeric characters and start with
an alphabet.
var parser = new Parser()
.floatbe("a")
.doublele("b");
string(name[, options])
Parse bytes as a string. name
should consist only of alpha numeric
characters and start with an alphabet. options
is an object which can have
the following keys:
encoding
- (Optional, defaults to utf8
) Specify which encoding to use.
"utf8"
, "ascii"
, "hex"
and else are valid. See
Buffer.toString
for more info.
length
- (Optional) (Bytes)Length of the string. Can be a number, string or a
function. Use number for statically sized arrays, string to reference
another variable and function to do some calculation.
Note: when encoding the string is padded with spaces (0x20) at end to fit the length requirement.
zeroTerminated
- (Optional, defaults to false
) If true, then this parser
reads until it reaches zero.
greedy
- (Optional, defaults to false
) If true, then this parser reads
until it reaches the end of the buffer. Will consume zero-bytes. (Note: has
no effect on encoding function)
stripNull
- (Optional, must be used with length
) If true, then strip
null characters from end of the string. (Note: has no effect on encoding, but
when used, then the parse() and encode() functions are not the exact opposite)
trim
- (Optional, default to false
) If true, then trim() (remove leading and trailing spaces)
the parsed string.
padding
- (Optional, Only used for encoding, default to right
) If left
then the string
will be right aligned (padding left with padd
char or space) depending of the length
option
padd
- (Optional, Only used for encoding with length
specified) A string from which first character (1 Byte)
is used as a padding char if necessary (provided string length is less than length
option). Note: Only 'ascii'
or utf8 < 0x80 are alowed (fallback to 'space' padding else).
buffer(name[, options])
Parse bytes as a buffer. name
should consist only of alpha numeric
characters and start with an alphabet. options
is an object which can have
the following keys:
clone
- (Optional, defaults to false
) By default,
buffer(name [,options])
returns a new buffer which references the same
memory as the parser input, but offset and cropped by a certain range. If
this option is true, input buffer will be cloned and a new buffer referncing
another memory is returned.
length
- (either length
or readUntil
is required) Length of the
buffer. Can be a number, string or a function. Use number for statically
sized buffers, string to reference another variable and function to do some
calculation.
readUntil
- (either length
or readUntil
is required) If "eof"
, then
this parser will read till it reaches end of the Buffer
object. (Note: has no
effect on encoding.)
array(name, options)
Parse bytes as an array. options
is an object which can have the following
keys:
type
- (Required) Type of the array element. Can be a string or an user
defined Parser object. If it's a string, you have to choose from [u]int{8,
16, 32}{le, be}.
length
- (either length
, lengthInBytes
, or readUntil
is required)
Length of the array. Can be a number, string or a function. Use number for
statically sized arrays.
lengthInBytes
- (either length
, lengthInBytes
, or readUntil
is
required) Length of the array expressed in bytes. Can be a number, string or
a function. Use number for statically sized arrays.
readUntil
- (either length
, lengthInBytes
, or readUntil
is required)
If "eof"
, then this parser reads until the end of Buffer
object. If
function it reads until the function returns true. Note: When encoding,
the buffer
second parameter of readUntil
function is the buffer already encoded
before this array. So no read-ahead is possible.
encodeUntil
- a function (item, object), only used when encoding, that replaces
the readUntil
function when present and allow limit the number of encoded items
by returning true based on item values or other object properies.
var parser = new Parser()
.array("data", {
type: "int32",
length: 8
})
.uint8("dataLength")
.array("data2", {
type: "int32",
length: "dataLength"
})
.array("data3", {
type: "int32",
length: function() {
return this.dataLength - 1;
}
})
.array("data4", {
type: "int32",
lengthInBytes: 16
})
.uint8("dataLengthInBytes")
.array("data5", {
type: "int32",
lengthInBytes: "dataLengthInBytes"
})
.array("data6", {
type: "int32",
lengthInBytes: function() {
return this.dataLengthInBytes - 4;
}
})
.array("data7", {
type: "int32",
readUntil: function(item, buffer) {
return item === 42;
}
})
.array("data8", {
type: userDefinedParser,
length: "dataLength"
});
choice([name,] options)
Choose one parser from multiple parsers according to a field value and store
its parsed result to key name
. If name
is null or omitted, the result of
the chosen parser is directly embedded into the current object. options
is
an object which can have the following keys:
tag
- (Required) The value used to determine which parser to use from the
choices
Can be a string pointing to another field or a function.
choices
- (Required) An object which key is an integer and value is the
parser which is executed when tag
equals the key value.
defaultChoice
- (Optional) In case of the tag value doesn't match any of
choices
, this parser is used.
var parser1 = ...;
var parser2 = ...;
var parser3 = ...;
var parser = new Parser().uint8("tagValue").choice("data", {
tag: "tagValue",
choices: {
1: parser1,
4: parser2,
5: parser3
}
});
Combining choice
with array
is an idiom to parse
TLV-based formats.
nest([name,] options)
Execute an inner parser and store its result to key name
. If name
is null
or omitted, the result of the inner parser is directly embedded into the
current object. options
is an object which can have the following keys:
type
- (Required) A Parser
object.
skip(length)
Skip parsing for length
bytes. (Note: when encoding, the skipped bytes will be filled
with zeros)
endianess(endianess)
Define what endianess to use in this parser. endianess
can be either
"little"
or "big"
. The default endianess of Parser
is set to big-endian.
var parser = new Parser()
.endianess("little")
.uint16be("a")
.uint32le("a")
.uint16("b")
.int32("c");
namely(alias)
Set an alias to this parser, so there will be an opportunity to refer to it by
name in methods like .array
, .nest
and .choice
, instead of requirement
to have an instance of it.
Especially, the parser may reference itself:
var stop = new Parser();
var parser = new Parser()
.namely("self")
.uint8("type")
.choice("data", {
tag: "type",
choices: {
0: stop,
1: "self",
2: Parser.start()
.nest("left", { type: "self" })
.nest("right", { type: "self" }),
3: Parser.start()
.nest("one", { type: "self" })
.nest("two", { type: "self" })
.nest("three", { type: "self" })
}
});
var buffer = Buffer.from([
2,
3,
1, 0,
0,
2,
1, 0,
0,
1, 0
]);
parser.parse(buffer);
For most of the cases there is almost no difference to the instance-way of
referencing, but this method provides the way to parse recursive trees, where
each node could reference the node of the same type from the inside.
Also, when you reference a parser using its instance twice, the generated code
will contain two similar parts of the code included, while with the named
approach, it will include a function with a name, and will just call this
function for every case of usage.
NB: This style could lead to circular references and infinite recursion, to
avoid this, ensure that every possible path has its end. Also, this recursion
is not tail-optimized, so could lead to memory leaks when it goes too deep.
An example of referencing other patches:
var parser = Parser.start().namely("self");
var stop = Parser.start().namely("stop");
var twoCells = Parser.start()
.namely("twoCells")
.nest("left", { type: "self" })
.nest("right", { type: "stop" });
parser.uint8("type").choice("data", {
tag: "type",
choices: {
0: "stop",
1: "self",
2: "twoCells"
}
});
var buffer = Buffer.from([2, 1, 1, 0, 0]);
parser.parse(buffer);
compile() and compileEncode()
Compile this parser/encoder on-the-fly and cache its result. Usually, there is no need
to call this method directly, since it's called when parse(buffer)
or encode(obj)
is
executed for the first time.
getCode() and getCodeEncode()
Dynamically generates the code for this parser/encoder and returns it as a string.
Usually used for debugging.
Common options
These are common options that can be specified in all parsers.
-
formatter
- Function that transforms the parsed value into a more desired
form. formatter(value, obj, buffer, offset) → new value
where value
is the value to be formatted, obj
is the current object being generated, buffer
is the buffer currently beeing parsed and offset
is the current offset in that buffer.
var parser = new Parser().array("ipv4", {
type: uint8,
length: "4",
formatter: function(arr, obj, buffer, offset) {
return arr.join(".");
}
});
-
encoder
- Function that transforms an object property into a more desired
form for encoding. This is the opposite of the above formatter
function.
encoder(value) → new value
where value
is the value to be encoded (de-formatted) and obj
is the object currently being encoded.
var parser = new Parser().array("ipv4", {
type: uint8,
length: "4",
formatter: function(arr, obj, buffer, offset) {
return arr.join(".");
},
encoder: function(str, obj) {
return str.split(".");
}
});
-
assert
- Do assertion on the parsed result (useful for checking magic
numbers and so on). If assert
is a string
or number
, the actual parsed
result will be compared with it with ===
(strict equality check), and an
exception is thrown if they mismatch. On the other hand, if assert
is a
function, that function is executed with one argument (parsed result) and if
it returns false, an exception is thrown.
var ClassFile = Parser.start()
.endianess("big")
.uint32("magic", { assert: 0xcafebabe });
var parser = new Parser()
.int16le("a")
.int16le("b")
.int16le("c", {
assert: function(x) {
return this.a + this.b === x;
}
});
Examples
See example
for more complex examples.
Support
Please report issues to the
issue tracker if you have
any difficulties using this module, found a bug, or request a new feature.
Pull requests with fixes and improvements are welcomed!