Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

seek-bzip

Package Overview
Dependencies
Maintainers
1
Versions
10
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

seek-bzip - npm Package Compare versions

Comparing version 0.0.3 to 1.0.1

.npmignore

7

package.json
{
"name": "seek-bzip",
"version": "0.0.3",
"version": "1.0.1",
"author": "Eli Skeggs, C. Scott Ananian, Kevin Kwok",

@@ -12,2 +12,6 @@ "description": "a pure-javascript Node.JS module for decoding bzip2 data",

"license": "LGPL 2.1",
"bin": {
"seek-bunzip": "./bin/seek-bunzip",
"seek-table": "./bin/seek-bzip-table"
},
"directories": {

@@ -17,2 +21,3 @@ "test": "test"

"dependencies": {
"commander": "~1.1.1"
},

@@ -19,0 +24,0 @@ "devDependencies": {

71

README.md

@@ -11,2 +11,5 @@ # seek-bzip

This package uses
[Typed Arrays](https://developer.mozilla.org/en-US/docs/JavaScript/Typed_arrays), which are present in node.js >= 0.5.5.
## Usage

@@ -38,18 +41,53 @@

`require('seek-bzip')` returns a `Bunzip` object. It contains two static
`require('seek-bzip')` returns a `Bunzip` object. It contains three static
methods. The first is a function accepting one or two parameters:
`Bunzip.decode = function(Buffer inputBuffer, [Number expectedSize])`
`Bunzip.decode = function(input, [Number expectedSize] or [output], [boolean multistream])`
If `expectedSize` is not present, `decodeBzip` simply decodes `inputBuffer` and returns the resulting `Buffer`.
The `input` argument can be a "stream" object (which must implement the
`readByte` method), or a `Buffer`.
If `expectedSize` is present, `decodeBzip` will store the results in a `Buffer` of length `expectedSize`, and throw an error in the case that the size of the decoded data does not match `expectedSize`.
If `expectedSize` is not present, `decodeBzip` simply decodes `input` and
returns the resulting `Buffer`.
The second is a function accepting two or three parameters:
If `expectedSize` is present (and numeric), `decodeBzip` will store
the results in a `Buffer` of length `expectedSize`, and throw an error
in the case that the size of the decoded data does not match
`expectedSize`.
`Bunzip.decodeBlock = function(Buffer inputBuffer, Number blockStartBits, [Number expectedSize])`
If you pass a non-numeric second parameter, it can either be a `Buffer`
object (which must be of the correct length; an error will be thrown if
the size of the decoded data does not match the buffer length) or
a "stream" object (which must implement a `writeByte` method).
The `inputBuffer` and `expectedSize` parameters are as above.
The optional third `multistream` parameter, if true, attempts to continue
reading past the end of the bzip2 file. This supports "multistream"
bzip2 files, which are simply multiple bzip2 files concatenated together.
If this argument is true, the input stream must have an `eof` method
which returns true when the end of the input has been reached.
The second exported method is a function accepting two or three parameters:
`Bunzip.decodeBlock = function(input, Number blockStartBits, [Number expectedSize] or [output])`
The `input` and `expectedSize`/`output` parameters are as above.
The `blockStartBits` parameter gives the start of the desired block, in bits.
If passing a stream as the `input` parameter, it must implement the
`seek` method.
The final exported method is a function accepting two or three parameters:
`Bunzip.table = function(input, Function callback, [boolean multistream])`
The `input` and `multistream` parameters are identical to those for the
`decode` method.
This function will invoke `callback(position, size)` once per bzip2 block,
where `position` gives the starting position of the block (in *bits*), and
`size` gives the uncompressed size of the block (in bytes).
This can be used to construct an index allowing direct access to a particular
block inside a bzip2 file, using the `decodeBlock` method.
## Help wanted

@@ -60,16 +98,13 @@

* Streaming interface. The original `micro-bunzip2` and `seek-bzip2` codebases
contained a slightly more complicated input/output system which allowed
streaming chunks of input and output data. It wouldn't be hard to retrofit
that to this code base.
* Add compression along with decompression. See `micro-bzip` at
http://www.landley.net/code/
* Port the `bzip-table` tool from the `seek-bzip2` codebase, so that index
generation is self-contained. Again, not very hard!
## Related projects
* Add command-line binaries to the node module for `bzip-table` and
`seek-bunzip`.
* https://github.com/skeggse/node-bzip node-bzip (original upstream source)
* https://github.com/cscott/compressjs
Lots of compression/decompression algorithms from the same author as this
module.
* https://github.com/cscott/lzjb fast LZJB compression/decompression
* Add compression along with decompression. See `micro-bzip` at
http://www.landley.net/code/
## License

@@ -76,0 +111,0 @@

@@ -33,8 +33,16 @@ /*

// offset in bytes
var BitReader = function(buf, offset) {
this.buf = buf;
this.offset = offset || 0;
var BitReader = function(stream) {
this.stream = stream;
this.bitOffset = 0;
this.curByte = 0;
this.hasByte = false;
};
BitReader.prototype._ensureByte = function() {
if (!this.hasByte) {
this.curByte = this.stream.readByte();
this.hasByte = true;
}
};
// reads bits from the buffer

@@ -44,2 +52,3 @@ BitReader.prototype.read = function(bits) {

while (bits > 0) {
this._ensureByte();
var remaining = 8 - this.bitOffset;

@@ -49,3 +58,4 @@ // if we're in a byte

result <<= remaining;
result |= BITMASK[remaining] & this.buf[this.offset++];
result |= BITMASK[remaining] & this.curByte;
this.hasByte = false;
this.bitOffset = 0;

@@ -56,3 +66,3 @@ bits -= remaining;

var shift = remaining - bits;
result |= (this.buf[this.offset] & (BITMASK[bits] << shift)) >> shift;
result |= (this.curByte & (BITMASK[bits] << shift)) >> shift;
this.bitOffset += bits;

@@ -69,4 +79,5 @@ bits = 0;

var n_byte = (pos - n_bit) / 8;
this.offset = n_byte;
this.bitOffset = n_bit;
this.stream.seek(n_byte);
this.hasByte = false;
};

@@ -76,9 +87,5 @@

BitReader.prototype.pi = function() {
var buf;
if (this.bitOffset === 0)
buf = this.buf.slice(this.offset, this.offset += 6);
else {
buf = new Buffer(6);
for (var i = 0; i < buf.length; i++)
buf[i] = this.read(8);
var buf = new Buffer(6), i;
for (i = 0; i < buf.length; i++) {
buf[i] = this.read(8);
}

@@ -85,0 +92,0 @@ return buf.toString('hex');

@@ -34,2 +34,5 @@ /*

var BitReader = require('./bitreader');
var Stream = require('./stream');
var CRC32 = require('./crc32');
var pjson = require('../package.json');

@@ -40,2 +43,3 @@ var MAX_HUFCODE_BITS = 20;

var SYMBOL_RUNB = 1;
var MIN_GROUPS = 2;
var MAX_GROUPS = 6;

@@ -49,6 +53,7 @@ var GROUP_SIZE = 50;

var src = array[index], i;
for (i = index; i > 0; ) {
array[i] = array[--i];
for (i = index; i > 0; i--) {
array[i] = array[i-1];
}
return array[0] = src;
array[0] = src;
return src;
};

@@ -77,3 +82,3 @@

var _throw = function(status, optDetail) {
var msg = ErrorMessage[status] || 'unknown error';
var msg = ErrorMessages[status] || 'unknown error';
if (optDetail) { msg += ': '+optDetail; }

@@ -85,6 +90,6 @@ var e = new TypeError(msg);

var Bunzip = function(inputbuffer, outputsize) {
var Bunzip = function(inputStream, outputStream) {
this.writePos = this.writeCurrent = this.writeCount = 0;
this._start_bunzip(inputbuffer, outputsize);
this._start_bunzip(inputStream, outputStream);
};

@@ -97,15 +102,18 @@ Bunzip.prototype._init_block = function() {

}
this.writeCRC = 0xffffffff;
this.blockCRC = new CRC32();
return true;
};
/* XXX micro-bunzip uses (inputStream, inputBuffer, len) as arguments */
Bunzip.prototype._start_bunzip = function(inputbuffer, outputsize) {
Bunzip.prototype._start_bunzip = function(inputStream, outputStream) {
/* Ensure that file starts with "BZh['1'-'9']." */
if (inputbuffer.toString(null, 0, 3) !== 'BZh')
var buf = new Buffer(4);
if (inputStream.read(buf, 0, 4) !== 4 ||
String.fromCharCode(buf[0], buf[1], buf[2]) !== 'BZh')
_throw(Err.NOT_BZIP_DATA, 'bad magic');
var level = inputbuffer[3] - 0x30;
var level = buf[3] - 0x30;
if (level < 1 || level > 9)
_throw(Err.NOT_BZIP_DATA, 'level out of range');
this.reader = new BitReader(inputbuffer, 4);
this.reader = new BitReader(inputStream);

@@ -115,7 +123,8 @@ /* Fourth byte (ascii '1'-'9'), indicates block size in units of 100k of

this.dbufSize = 100000 * level;
this.output = outputsize ? new Buffer(outputsize) : '';
this.nextoutput = 0;
this.outputsize = outputsize;
this.outputStream = outputStream;
this.streamCRC = 0;
};
Bunzip.prototype._get_next_block = function() {
var i, j, k;
var reader = this.reader;

@@ -131,3 +140,5 @@ // this is get_next_block() function from micro-bunzip:

_throw(Err.NOT_BZIP_DATA);
reader.read(32); // ignoring CRC codes; is this wise?
this.targetBlockCRC = reader.read(32) >>> 0; // (convert to unsigned)
this.streamCRC = (this.targetBlockCRC ^
((this.streamCRC << 1) | (this.streamCRC>>>31))) >>> 0;
/* We can add support for blockRandomised if anybody complains. There was

@@ -148,6 +159,7 @@ some code for this in busybox 1.0.0-pre3, but nobody ever noticed that

var symToByte = new Buffer(256), symTotal = 0;
for (var i = 0; i < 16; i++) {
for (i = 0; i < 16; i++) {
if (t & (1 << (0xF - i))) {
var k = reader.read(16), o = i * 16;
for (var j = 0; j < 16; j++)
var o = i * 16;
k = reader.read(16);
for (j = 0; j < 16; j++)
if (k & (1 << (0xF - j)))

@@ -160,3 +172,3 @@ symToByte[symTotal++] = o + j;

var groupCount = reader.read(3);
if (groupCount < 2 || groupCount > MAX_GROUPS)
if (groupCount < MIN_GROUPS || groupCount > MAX_GROUPS)
_throw(Err.DATA_ERROR);

@@ -171,4 +183,4 @@ /* nSelectors: Every GROUP_SIZE many symbols we select a new huffman coding

var mtfSymbol = []; // TODO: possibly replace with buffer?
for (var i = 0; i < groupCount; i++)
var mtfSymbol = new Buffer(256);
for (i = 0; i < groupCount; i++)
mtfSymbol[i] = i;

@@ -178,5 +190,5 @@

for (var i = 0; i < nSelectors; i++) {
for (i = 0; i < nSelectors; i++) {
/* Get next value */
for (var j = 0; reader.read(1); j++)
for (j = 0; reader.read(1); j++)
if (j >= groupCount) _throw(Err.DATA_ERROR);

@@ -190,4 +202,4 @@ /* Decode MTF to get the next selector */

var symCount = symTotal + 2;
var groups = [];
for (var j = 0; j < groupCount; j++) {
var groups = [], hufGroup;
for (j = 0; j < groupCount; j++) {
var length = new Buffer(symCount), temp = new Buffer(MAX_HUFCODE_BITS + 1);

@@ -198,3 +210,3 @@ /* Read huffman code lengths for each symbol. They're stored in

t = reader.read(5); // lengths
for (var i = 0; i < symCount; i++) {
for (i = 0; i < symCount; i++) {
for (;;) {

@@ -217,3 +229,3 @@ if (t < 1 || t > MAX_HUFCODE_BITS) _throw(Err.DATA_ERROR);

minLen = maxLen = length[0];
for (var i = 1; i < symCount; i++) {
for (i = 1; i < symCount; i++) {
if (length[i] > maxLen)

@@ -235,11 +247,11 @@ maxLen = length[i];

*/
var hufGroup = {};
hufGroup = {};
groups.push(hufGroup);
hufGroup.permute = new Array(MAX_SYMBOLS); // UInt32Array
hufGroup.limit = new Array(MAX_HUFCODE_BITS + 2); // UInt32Array
hufGroup.base = new Array(MAX_HUFCODE_BITS + 1); // UInt32Array
hufGroup.permute = new Uint16Array(MAX_SYMBOLS);
hufGroup.limit = new Uint32Array(MAX_HUFCODE_BITS + 2);
hufGroup.base = new Uint32Array(MAX_HUFCODE_BITS + 1);
hufGroup.minLen = minLen;
hufGroup.maxLen = maxLen;
/* Calculate permute[]. Concurently, initialize temp[] and limit[]. */
var pp = 0, i;
var pp = 0;
for (i = minLen; i <= maxLen; i++) {

@@ -267,10 +279,10 @@ temp[i] = hufGroup.limit[i] = 0;

don't affect the value>limit[length] comparison. */
hufGroup.limit[i + 1] = pp - 1;
hufGroup.limit[i] = pp - 1;
pp <<= 1;
t += temp[i];
hufGroup.base[i + 2] = pp - t;
hufGroup.base[i + 1] = pp - t;
}
hufGroup.limit[maxLen + 2] = Number.MAX_VALUE; /* Sentinal value for reading next sym. */
hufGroup.limit[maxLen + 1] = pp + temp[maxLen] - 1;
hufGroup.base[minLen + 1] = 0;
hufGroup.limit[maxLen + 1] = Number.MAX_VALUE; /* Sentinal value for reading next sym. */
hufGroup.limit[maxLen] = pp + temp[maxLen] - 1;
hufGroup.base[minLen] = 0;
}

@@ -282,8 +294,9 @@ /* We've finished reading and digesting the block header. Now read this

/* Initialize symbol occurrence counters and symbol Move To Front table */
var byteCount = new Uint32Array(256); // Uint32Array
for (var i = 0; i < 256; i++)
var byteCount = new Uint32Array(256);
for (i = 0; i < 256; i++)
mtfSymbol[i] = i;
/* Loop through compressed symbols. */
var runPos = 0, dbufCount = 0, symCount = 0, selector = 0, uc;
var dbuf = this.dbuf = new Array(this.dbufSize); // Uint32Array
var runPos = 0, dbufCount = 0, selector = 0, uc;
var dbuf = this.dbuf = new Uint32Array(this.dbufSize);
symCount = 0;
for (;;) {

@@ -297,7 +310,7 @@ /* Determine which huffman coding group to use. */

/* Read next huffman-coded symbol. */
i = hufGroup.minLen
i = hufGroup.minLen;
j = reader.read(i);
for (;;i++) {
if (i > hufGroup.maxLen) { _throw(Err.DATA_ERROR); }
if (j <= hufGroup.limit[i + 1])
if (j <= hufGroup.limit[i])
break;

@@ -307,3 +320,3 @@ j = (j << 1) | reader.read(1);

/* Huffman decode value to get nextSym (with bounds checking) */
j -= hufGroup.base[i + 1];
j -= hufGroup.base[i];
if (j < 0 || j >= MAX_SYMBOLS) { _throw(Err.DATA_ERROR); }

@@ -359,9 +372,3 @@ var nextSym = hufGroup.permute[j];

i = nextSym - 1;
uc = mtfSymbol[i];
/* Adjust the MTF array. Since we typically expect to move only a
* small number of symbols, and are bound by 256 in any case, using
* memmove here would typically be bigger and slower due to function
* call overhead and other assorted setup costs. */
mtfSymbol.splice(i, 1);
mtfSymbol.splice(0, 0, uc);
uc = mtf(mtfSymbol, i);
uc = symToByte[uc];

@@ -380,4 +387,4 @@ /* We have our literal byte. Save it into dbuf. */

/* Turn byteCount into cumulative occurrence counts of 0 to n-1. */
var j = 0;
for (var i = 0; i < 256; i++) {
j = 0;
for (i = 0; i < 256; i++) {
k = j + byteCount[i];

@@ -388,3 +395,3 @@ byteCount[i] = j;

/* Figure out what order dbuf would be in if we sorted it. */
for (var i = 0; i < dbufCount; i++) {
for (i = 0; i < dbufCount; i++) {
uc = dbuf[i] & 0xff;

@@ -444,8 +451,7 @@ dbuf[byteCount[uc]] |= (i << 8);

}
if (outputsize)
while (copies--)
this.output[this.nextoutput++] = outbyte;
else
while (copies--)
this.output += String.fromCharCode(outbyte);
this.blockCRC.updateCRCRun(outbyte, copies);
while (copies--) {
this.outputStream.writeByte(outbyte);
this.nextoutput++;
}
if (current != previous)

@@ -455,18 +461,97 @@ run = 0;

this.writeCount = dbufCount;
// check CRC
if (this.blockCRC.getCRC() !== this.targetBlockCRC) {
_throw(Err.DATA_ERROR, "Bad block CRC "+
"(got "+this.blockCRC.getCRC().toString(16)+
" expected "+this.targetBlockCRC.toString(16)+")");
}
return this.nextoutput;
};
var coerceInputStream = function(input) {
if ('readByte' in input) { return input; }
var inputStream = new Stream();
inputStream.pos = 0;
inputStream.readByte = function() { return input[this.pos++]; };
inputStream.seek = function(pos) { this.pos = pos; };
inputStream.eof = function() { return this.pos >= input.length; };
return inputStream;
};
var coerceOutputStream = function(output) {
var outputStream = new Stream();
var resizeOk = true;
if (output) {
if (typeof(output)==='number') {
outputStream.buffer = new Buffer(output);
resizeOk = false;
} else if ('writeByte' in output) {
return output;
} else {
outputStream.buffer = output;
resizeOk = false;
}
} else {
outputStream.buffer = new Buffer(16384);
}
outputStream.pos = 0;
outputStream.writeByte = function(_byte) {
if (resizeOk && this.pos >= this.buffer.length) {
var newBuffer = new Buffer(this.buffer.length*2);
this.buffer.copy(newBuffer);
this.buffer = newBuffer;
}
this.buffer[this.pos++] = _byte;
};
outputStream.getBuffer = function() {
// trim buffer
if (this.pos !== this.buffer.length) {
if (!resizeOk)
throw new TypeError('outputsize does not match decoded input');
var newBuffer = new Buffer(this.pos);
this.buffer.copy(newBuffer, 0, 0, this.pos);
this.buffer = newBuffer;
}
return this.buffer;
};
outputStream._coerced = true;
return outputStream;
};
/* Static helper functions */
Bunzip.Err = Err;
Bunzip.decode = function(inputbuffer, outputsize) {
var bz = new Bunzip(inputbuffer, outputsize);
while (bz._init_block()) {
bz._read_bunzip();
// 'input' can be a stream or a buffer
// 'output' can be a stream or a buffer or a number (buffer size)
Bunzip.decode = function(input, output, multistream) {
// make a stream from a buffer, if necessary
var inputStream = coerceInputStream(input);
var outputStream = coerceOutputStream(output);
var bz = new Bunzip(inputStream, outputStream);
while (true) {
if ('eof' in inputStream && inputStream.eof()) break;
if (bz._init_block()) {
bz._read_bunzip();
} else {
var targetStreamCRC = bz.reader.read(32) >>> 0; // (convert to unsigned)
if (targetStreamCRC !== bz.streamCRC) {
_throw(Err.DATA_ERROR, "Bad stream CRC "+
"(got "+bz.streamCRC.toString(16)+
" expected "+targetStreamCRC.toString(16)+")");
}
if (multistream &&
'eof' in inputStream &&
!inputStream.eof()) {
// note that start_bunzip will also resync the bit reader to next byte
bz._start_bunzip(inputStream, outputStream);
} else break;
}
}
if (bz.outputsize && bz.nextoutput !== bz.outputsize)
throw new TypeError('outputsize does not match decoded input');
return bz.outputsize ? bz.output : new Buffer(bz.output, 'ascii');
if ('getBuffer' in outputStream)
return outputStream.getBuffer();
};
Bunzip.decodeBlock = function(inputbuffer, pos, outputsize) {
var bz = new Bunzip(inputbuffer, outputsize);
Bunzip.decodeBlock = function(input, pos, output) {
// make a stream from a buffer, if necessary
var inputStream = coerceInputStream(input);
var outputStream = coerceOutputStream(output);
var bz = new Bunzip(inputStream, outputStream);
bz.reader.seek(pos);

@@ -477,6 +562,6 @@ /* Fill the decode buffer for the block */

/* Init the CRC for writing */
this.writeCRC = 0xffffffff;
bz.blockCRC = new CRC32();
/* Zero this so the current byte from before the seek is not written */
this.writeCopies = 0;
bz.writeCopies = 0;

@@ -487,5 +572,56 @@ /* Decompress the block and write to stdout */

}
return bz.outputsize ? bz.output : new Buffer(bz.output, 'ascii');
}
if ('getBuffer' in outputStream)
return outputStream.getBuffer();
};
/* Reads bzip2 file from stream or buffer `input`, and invoke
* `callback(position, size)` once for each bzip2 block,
* where position gives the starting position (in *bits*)
* and size gives uncompressed size of the block (in *bytes*). */
Bunzip.table = function(input, callback, multistream) {
// make a stream from a buffer, if necessary
var inputStream = new Stream();
inputStream.delegate = coerceInputStream(input);
inputStream.pos = 0;
inputStream.readByte = function() {
this.pos++;
return this.delegate.readByte();
};
if (inputStream.delegate.eof) {
inputStream.eof = inputStream.delegate.eof.bind(inputStream.delegate);
}
var outputStream = new Stream();
outputStream.pos = 0;
outputStream.writeByte = function() { this.pos++; };
var bz = new Bunzip(inputStream, outputStream);
var blockSize = bz.dbufSize;
while (true) {
if ('eof' in inputStream && inputStream.eof()) break;
var position = inputStream.pos*8 + bz.reader.bitOffset;
if (bz.reader.hasByte) { position -= 8; }
if (bz._init_block()) {
var start = outputStream.pos;
bz._read_bunzip();
callback(position, outputStream.pos - start);
} else {
var crc = bz.reader.read(32); // (but we ignore the crc)
if (multistream &&
'eof' in inputStream &&
!inputStream.eof()) {
// note that start_bunzip will also resync the bit reader to next byte
bz._start_bunzip(inputStream, outputStream);
console.assert(bz.dbufSize === blockSize,
"shouldn't change block size within multistream file");
} else break;
}
}
};
Bunzip.Stream = Stream;
Bunzip.version = pjson.version;
Bunzip.license = pjson.license;
module.exports = Bunzip;
SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc