yauzl
yet another unzip library for node. For zipping, see
yazl.
Design principles:
- Follow the spec.
Don't scan for local file headers.
Read the central directory for file metadata.
(see No Streaming Unzip API).
- Don't block the JavaScript thread.
Use and provide async APIs.
- Keep memory usage under control.
Don't attempt to buffer entire files in RAM at once.
- Never crash (if used properly).
Don't let malformed zip files bring down client applications who are trying to catch errors.
- Catch unsafe filenames entries.
A zip file entry throws an error if its file name starts with
"/"
or /[A-Za-z]:\//
or if it contains ".."
path segments or "\\"
(per the spec).
Usage
var yauzl = require("yauzl");
var fs = require("fs");
var path = require("path");
var mkdirp = require("mkdirp");
yauzl.open("path/to/file.zip", {lazyEntries: true}, function(err, zipfile) {
if (err) throw err;
zipfile.readEntry();
zipfile.on("entry", function(entry) {
if (/\/$/.test(entry.fileName)) {
mkdirp(entry.fileName, function(err) {
if (err) throw err;
zipfile.readEntry();
});
} else {
zipfile.openReadStream(entry, function(err, readStream) {
if (err) throw err;
mkdirp(path.dirname(entry.fileName), function(err) {
if (err) throw err;
readStream.pipe(fs.createWriteStream(entry.fileName));
readStream.on("end", function() {
zipfile.readEntry();
});
});
});
}
});
});
API
The default for every optional callback
parameter is:
function defaultCallback(err) {
if (err) throw err;
}
open(path, [options], [callback])
Calls fs.open(path, "r")
and gives the fd
, options
, and callback
to fromFd()
below.
options
may be omitted or null
. The defaults are {autoClose: true, lazyEntries: false}
.
autoClose
is effectively equivalent to:
zipfile.once("end", function() {
zipfile.close();
});
lazyEntries
indicates that entries should be read only when readEntry()
is called.
If lazyEntries
is false
, entry
events will be emitted as fast as possible to allow pipe()
ing
file data from all entries in parallel.
This is not recommended, as it can lead to out of control memory usage for zip files with many entries.
See issue #22.
If lazyEntries
is true
, an entry
or end
event will be emitted in response to each call to readEntry()
.
This allows processing of one entry at a time, and will keep memory usage under control for zip files with many entries.
fromFd(fd, [options], [callback])
Reads from the fd, which is presumed to be an open .zip file.
Note that random access is required by the zip file specification,
so the fd cannot be an open socket or any other fd that does not support random access.
The callback
is given the arguments (err, zipfile)
.
An err
is provided if the End of Central Directory Record Signature cannot be found in the file,
which indicates that the fd is not a zip file.
zipfile
is an instance of ZipFile
.
options
may be omitted or null
. The defaults are {autoClose: false, lazyEntries: false}
.
See open()
for the meaning of the options.
fromBuffer(buffer, [options], [callback])
Like fromFd()
, but reads from a RAM buffer instead of an open file.
buffer
is a Buffer
.
callback
is effectively passed directly to fromFd()
.
If a ZipFile
is acquired from this method,
it will never emit the close
event,
and calling close()
is not necessary.
options
may be omitted or null
. The defaults are {lazyEntries: false}
.
See open()
for the meaning of the options.
The autoClose
option is ignored for this method.
fromRandomAccessReader(reader, totalSize, [options], [callback])
This method of creating a zip file allows clients to implement their own back-end file system.
For example, a client might translate read calls into network requests.
The reader
parameter must be of a type that is a subclass of
RandomAccessReader that implements the required methods.
The totalSize
is a Number and indicates the total file size of the zip file.
options
may be omitted or null
. The defaults are {autoClose: true, lazyEntries: false}
.
See open()
for the meaning of the options.
dosDateTimeToDate(date, time)
Converts MS-DOS date
and time
data into a JavaScript Date
object.
Each parameter is a Number
treated as an unsigned 16-bit integer.
Note that this format does not support timezones,
so the returned object will use the local timezone.
Class: ZipFile
The constructor for the class is not part of the public API.
Use open()
, fromFd()
, fromBuffer()
, or fromRandomAccessReader()
instead.
Event: "entry"
Callback gets (entry)
, which is an Entry
.
See open()
and readEntry()
for when this event is emitted.
Event: "end"
Emitted after the last entry
event has been emitted.
See open()
and readEntry()
for more info on when this event is emitted.
Event: "close"
Emitted after the fd is actually closed.
This is after calling close()
(or after the end
event when autoClose
is true
),
and after all stream pipelines created from openReadStream()
have finished reading data from the fd.
If this ZipFile
was acquired from fromRandomAccessReader()
,
the "fd" in the previous paragraph refers to the RandomAccessReader
implemented by the client.
If this ZipFile
was acquired from fromBuffer()
, this event is never emitted.
Event: "error"
Emitted in the case of errors with reading the zip file.
(Note that other errors can be emitted from the streams created from openReadStream()
as well.)
After this event has been emitted, no further entry
, end
, or error
events will be emitted,
but the close
event may still be emitted.
readEntry()
Causes this ZipFile
to emit an entry
or end
event (or an error
event).
This method must only be called when this ZipFile
was created with the lazyEntries
option set to true
(see open()
).
When this ZipFile
was created with the lazyEntries
option set to true
,
entry
and end
events are only ever emitted in response to this method call.
The event that is emitted in response to this method will not be emitted until after this method has returned,
so it is safe to call this method before attaching event listeners.
After calling this method, calling this method again before the response event has been emitted will cause undefined behavior.
Calling this method after the end
event has been emitted will cause undefined behavior.
Calling this method after calling close()
will cause undefined behavior.
openReadStream(entry, callback)
entry
must be an Entry
object from this ZipFile
.
callback
gets (err, readStream)
, where readStream
is a Readable Stream
.
If the entry is compressed (with a supported compression method),
the read stream provides the decompressed data.
If this zipfile is already closed (see close()
), the callback
will receive an err
.
It's possible for the readStream
it to emit errors for several reasons.
For example, if zlib cannot decompress the data, the zlib error will be emitted from the readStream
.
Two more error cases are if the decompressed data has too many or too few actual bytes
compared to the reported byte count from the entry's uncompressedSize
field.
yauzl notices this false information and emits an error from the readStream
after some number of bytes have already been piped through the stream.
Because of this check, clients can always trust the uncompressedSize
field in Entry
objects.
Guarding against zip bomb attacks can be accomplished by
doing some heuristic checks on the size metadata and then watching out for the above errors.
Such heuristics are outside the scope of this library,
but enforcing the uncompressedSize
is implemented here as a security feature.
It is possible to destroy the readStream
before it has piped all of its data.
To do this, call readStream.destroy()
.
You must unpipe()
the readStream
from any destination before calling readStream.destroy()
.
If this zipfile was created using fromRandomAccessReader()
, the RandomAccessReader
implementation
must provide readable streams that implement a .destroy()
method (see randomAccessReader._readStreamForRange()
)
in order for calls to readStream.destroy()
to work in this context.
close()
Causes all future calls to openReadStream()
to fail,
and closes the fd after all streams created by openReadStream()
have emitted their end
events.
If the autoClose
option is set to true
(see open()
),
this function will be called automatically effectively in response to this object's end
event.
If the lazyEntries
option is set to false
(see open()
) and this object's end
event has not been emitted yet,
this function causes undefined behavior.
If the lazyEntries
option is set to true
,
you can call this function instead of calling readEntry()
to abort reading the entries of a zipfile.
It is safe to call this function multiple times; after the first call, successive calls have no effect.
This includes situations where the autoClose
option effectively calls this function for you.
isOpen
Boolean
. true
until close()
is called; then it's false
.
entryCount
Number
. Total number of central directory records.
String
. Always decoded with CP437
per the spec.
Class: Entry
Objects of this class represent Central Directory Records.
Refer to the zipfile specification for more details about these fields.
These fields are of type Number
:
versionMadeBy
versionNeededToExtract
generalPurposeBitFlag
compressionMethod
lastModFileTime
(MS-DOS format, see getLastModDateTime
)lastModFileDate
(MS-DOS format, see getLastModDateTime
)crc32
compressedSize
uncompressedSize
fileNameLength
(bytes)extraFieldLength
(bytes)fileCommentLength
(bytes)internalFileAttributes
externalFileAttributes
relativeOffsetOfLocalHeader
fileName
String
.
Following the spec, the bytes for the file name are decoded with
UTF-8
if generalPurposeBitFlag & 0x800
, otherwise with CP437
.
If fileName
would contain unsafe characters, such as an absolute path or
a relative directory, yauzl emits an error instead of an entry.
Array
with each entry in the form {id: id, data: data}
,
where id
is a Number
and data
is a Buffer
.
This library looks for and reads the ZIP64 Extended Information Extra Field (0x0001)
in order to support ZIP64 format zip files.
None of the other fields are considered significant by this library.
String
decoded with the same charset as used for fileName
.
getLastModDate()
Effectively implemented as:
return dosDateTimeToDate(this.lastModFileDate, this.lastModFileTime);
Class: RandomAccessReader
This class is meant to be subclassed by clients and instantiated for the fromRandomAccessReader()
function.
An example implementation can be found in test/test.js
.
randomAccessReader._readStreamForRange(start, end)
Subclasses must implement this method.
start
and end
are Numbers and indicate byte offsets from the start of the file.
end
is exclusive, so _readStreamForRange(0x1000, 0x2000)
would indicate to read 0x1000
bytes.
end - start
will always be at least 1
.
This method should return a readable stream which will be pipe()
ed into another stream.
It is expected that the readable stream will provide data in several chunks if necessary.
If the readable stream provides too many or too few bytes, an error will be emitted.
Any errors emitted on the readable stream will be handled and re-emitted on the client-visible stream
(returned from zipfile.openReadStream()
) or provided as the err
argument to the appropriate callback
(for example, for fromRandomAccessReader()
).
The returned stream must implement a method .destroy()
if you call readStream.destroy()
on streams you get from openReadStream()
.
If you never call readStream.destroy()
, then streams returned from this method do not need to implement a method .destroy()
.
.destroy()
should abort any streaming that is in progress and clean up any associated resources.
.destroy()
will only be called after the stream has been unpipe()
d from its destination.
Note that the stream returned from this method might not be the same object that is provided by openReadStream()
.
The stream returned from this method might be pipe()
d through one or more filter streams (for example, a zlib inflate stream).
randomAccessReader.read(buffer, offset, length, position, callback)
Subclasses may implement this method.
The default implementation uses createReadStream()
to fill the buffer
.
This method should behave like fs.read()
.
randomAccessReader.close(callback)
Subclasses may implement this method.
The default implementation is effectively setImmediate(callback);
.
callback
takes parameters (err)
.
This method is called once the all streams returned from _readStreamForRange()
have ended,
and no more _readStreamForRange()
or read()
requests will be issued to this object.
How to Avoid Crashing
When a malformed zipfile is encountered, the default behavior is to crash (throw an exception).
If you want to handle errors more gracefully than this,
be sure to do the following:
- Provide
callback
parameters where they are allowed, and check the err
parameter. - Attach a listener for the
error
event on any ZipFile
object you get from open()
, fromFd()
, fromBuffer()
, or fromRandomAccessReader()
. - Attach a listener for the
error
event on any stream you get from openReadStream()
.
Limitations
No Streaming Unzip API
Due to the design of the .zip file format, it's impossible to interpret a .zip file from start to finish
(such as from a readable stream) without sacrificing correctness.
The Central Directory, which is the authority on the contents of the .zip file, is at the end of a .zip file, not the beginning.
A streaming API would need to either buffer the entire .zip file to get to the Central Directory before interpreting anything
(defeating the purpose of a streaming interface), or rely on the Local File Headers which are interspersed through the .zip file.
However, the Local File Headers are explicitly denounced in the spec as being unreliable copies of the Central Directory,
so trusting them would be a violation of the spec.
Any library that offers a streaming unzip API must make one of the above two compromises,
which makes the library either dishonest or nonconformant (usually the latter).
This library insists on correctness and adherence to the spec, and so does not offer a streaming API.
Limitted ZIP64 Support
For ZIP64, only zip files smaller than 8PiB
are supported,
not the full 16EiB
range that a 64-bit integer should be able to index.
This is due to the JavaScript Number type being an IEEE 754 double precision float.
The Node.js fs
module probably has this same limitation.
ZIP64 Extensible Data Sector Is Ignored
The spec does not allow zip file creators to put arbitrary data here,
but rather reserves its use for PKWARE and mentions something about Z390.
This doesn't seem useful to expose in this library, so it is ignored.
No Multi-Disk Archive Support
This library does not support multi-disk zip files.
The multi-disk fields in the zipfile spec were intended for a zip file to span multiple floppy disks,
which probably never happens now.
If the "number of this disk" field in the End of Central Directory Record is not 0
,
the open()
, fromFd()
, fromBuffer()
, or fromRandomAccessReader()
callback
will receive an err
.
By extension the following zip file fields are ignored by this library and not provided to clients:
- Disk where central directory starts
- Number of central directory records on this disk
- Disk number where file starts
No Encryption Support
Currently, the presence of encryption is not even checked,
and encrypted zip files will cause undefined behavior.
Many unzip libraries mistakenly read the Local File Header data in zip files.
This data is officially defined to be redundant with the Central Directory information,
and is not to be trusted.
Aside from checking the signature, yauzl ignores the content of the Local File Header.
No CRC-32 Checking
This library provides the crc32
field of Entry
objects read from the Central Directory.
However, this field is not used for anything in this library.
The field versionNeededToExtract
is ignored,
because this library doesn't support the complete zip file spec at any version,
No Support For Obscure Compression Methods
Regarding the compressionMethod
field of Entry
objects,
only method 0
(stored with no compression)
and method 8
(deflated) are supported.
Any of the other 15 official methods will cause the openReadStream()
callback
to receive an err
.
Data Descriptors Are Ignored
There may or may not be Data Descriptor sections in a zip file.
This library provides no support for finding or interpreting them.
There may or may not be an Archive Extra Data Record section in a zip file.
This library provides no support for finding or interpreting it.
No Language Encoding Flag Support
Zip files officially support charset encodings other than CP437 and UTF-8,
but the zip file spec does not specify how it works.
This library makes no attempt to interpret the Language Encoding Flag.
Change History
- 2.4.3
- Fix crash when parsing malformed Extra Field buffers. issue #31
- 2.4.2
- Remove .npmignore and .travis.yml from npm package.
- 2.4.1
- 2.4.0
- 2.3.1
- 2.3.0
- Check that
uncompressedSize
is correct, or else emit an error. issue #13
- 2.2.1
- 2.2.0
- 2.1.0
- Remove dependency on
iconv
.
- 2.0.3
- Fix crash when trying to read a 0-byte file.
- 2.0.2
- Fix event behavior after errors.
- 2.0.1
- Fix bug with using
iconv
.
- 2.0.0