yazl
yet another zip library for node. For unzipping, see
yauzl.
Design principles:
- Don't block the JavaScript thread.
Use and provide async APIs.
- Keep memory usage under control.
Don't attempt to buffer entire files in RAM at once.
- Prefer to open input files one at a time than all at once.
This is slightly suboptimal for time performance,
but avoids OS-imposed limits on the number of simultaneously open file handles.
Usage
var yazl = require("yazl");
var zipfile = new yazl.ZipFile();
zipfile.addFile("file1.txt", "file1.txt");
zipfile.addFile("path/to/file.txt", "path/in/zipfile.txt");
zipfile.outputStream.pipe(fs.createWriteStream("output.zip")).on("close", function() {
console.log("done");
});
zipfile.addBuffer(Buffer.from("hello"), "hello.txt");
zipfile.addReadStreamLazy("stdin.txt", cb => cb(null, process.stdin));
zipfile.end();
API
Class: ZipFile
new ZipFile()
No parameters.
Nothing can go wrong.
addFile(realPath, metadataPath, [options])
Adds a file from the file system at realPath
into the zipfile as metadataPath
.
Typically metadataPath
would be calculated as path.relative(root, realPath)
.
Unzip programs would extract the file from the zipfile as metadataPath
.
realPath
is not stored in the zipfile.
A valid metadataPath
must not be blank.
If a metadataPath
contains "\\"
characters, they will be replaced by "/"
characters.
After this substitution, a valid metadataPath
must not start with "/"
or /[A-Za-z]:\//
,
and must not contain ".."
path segments.
File paths must not end with "/"
, but see addEmptyDirectory()
.
After UTF-8 encoding, metadataPath
must be at most 0xffff
bytes in length.
options
may be omitted or null and has the following structure and default values:
{
mtime: stats.mtime,
mode: stats.mode,
compress: true,
compressionLevel: 6,
forceZip64Format: false,
forceDosTimestamp: false,
fileComment: "",
}
Use mtime
and/or mode
to override the values
that would normally be obtained by the fs.Stats
for the realPath
.
The mode is the unix permission bits and file type.
The mtime and mode are stored in the zip file in the fields "last mod file time",
"last mod file date", and "external file attributes".
yazl does not store group and user ids in the zip file.
If compress
is true
, the file data will be deflated (compression method 8).
If compress
is false
, the file data will be stored (compression method 0).
If compressionLevel
is specified, it will be passed to zlib
.
Specifying compressionLevel: 0
is equivalent to compress: false
.
If both compress
and compressionLevel
are given, asserts that they do not conflict, i.e. !!compress === !!compressionLevel
.
If forceZip64Format
is true
, yazl will use ZIP64 format in this entry's Data Descriptor
and Central Directory Record even if not needed (this may be useful for testing.).
Otherwise, yazl will use ZIP64 format where necessary.
Since yazl version 3.3.0, yazl includes the Info-ZIP "universal timestamp" extended field (0x5455
aka "UT"
) to encode the mtime
.
The Info-ZIP timestamp is a more modern encoding for the mtime and is generally recommended.
Set forceDosTimestamp
to true
to revert to the pre-3.3.0 behvior, disabling this extended field.
The DOS encoding is always included regardless of this option, because it is required in the fixed-size metadata of every archive entry.
The benefits of the Info-ZIP encoding include: timezone is specified as always UTC, which is better for cloud environments and any teams working in multiple timezones; capable of encoding "time 0", the unix epoch in 1970, which is better for some package managers; the precision is 1-second accurate rather than rounded to the nearest even second. The disadvantages of including this field are: it requires an extra 9 bytes of metadata per entry added to the archive.
When attempting to encode an mtime
outside the supported range for either format, such as the year 1970 in the DOS format or the year 2039 for the modern format, the time will clamped to the closest supported time.
If fileComment
is a string
, it will be encoded with UTF-8.
If fileComment
is a Buffer
, it should be a UTF-8 encoded string.
In UTF-8, fileComment
must be at most 0xffff
bytes in length.
This becomes the "file comment" field in this entry's central directory file header.
Internally, fs.stat()
is called immediately in the addFile
function,
and fs.createReadStream()
is used later when the file data is actually required.
Throughout adding and encoding n
files with addFile()
,
the number of simultaneous open files is O(1)
, probably just 1 at a time.
addReadStream(readStream, metadataPath, [options])
Adds a file to the zip file whose content is read from readStream
.
This method is effectively implemented as this.addReadStreamLazy(metadataPath, options, cb => cb(null, readStream))
.
In general, it is recommended to use addReadStreamLazy
instead of this method
to avoid holding a large number of system resources open for a long time.
This method is provided for backward compatibility,
and for convenience in cases where the readStream
doesn't require meaningful resources to hold open and waiting.
addReadStreamLazy(metadataPath[, options], getReadStreamFunction)
Adds a file to the zip file whose content is read from a read stream obtained by calling getReadStreamFunction(cb)
.
getReadStreamFunction(cb)
is called with a single callback function.
Your implementation of getReadStreamFunction
should eventually call cb(err, readStream)
and give the readStream
that provides the contents of the file to add to the zip file.
If err
is given (if it is truthy), it will be emitted from this ZipFile
object.
The return value from cb
is unspecified.
See addFile()
for the meaning of the metadataPath
parameter.
typeof getReadStreamFunction
must be 'function'
, which is used to determine when options
has been omitted.
options
may be omitted or null and has the following structure and default values:
{
mtime: new Date(),
mode: 0o100664,
compress: true,
compressionLevel: 6,
forceZip64Format: false,
forceDosTimestamp: false,
fileComment: "",
size: 12345,
}
See addFile()
for the meaning of mtime
, mode
, compress
, compressionLevel
, forceZip64Format
, forceDosTimestamp
, and fileComment
.
If size
is given, it will be checked against the actual number of bytes in the readStream
,
and an error will be emitted if there is a mismatch.
See the documentation on calculatedTotalSizeCallback
for why the size
option exists.
Note that yazl will .pipe()
data from readStream
, so be careful using .on('data')
.
In certain versions of node, .on('data')
makes .pipe()
behave incorrectly.
Here's an example call to this method to illustrate the function callbacks:
zipfile.addReadStreamLazy("path/in/archive.txt", function(cb) {
var readStream = getTheReadStreamSomehow();
cb(readStream);
});
addBuffer(buffer, metadataPath, [options])
Adds a file to the zip file whose content is buffer
.
See below for info on the limitations on the size of buffer
.
See addFile()
for info about the metadataPath
parameter.
options
may be omitted or null and has the following structure and default values:
{
mtime: new Date(),
mode: 0o100664,
compress: true,
compressionLevel: 6,
forceZip64Format: false,
forceDosTimestamp: false,
fileComment: "",
}
See addFile()
for the meaning of mtime
, mode
, compress
, compressionLevel
, forceZip64Format
, forceDosTimestamp
, and fileComment
.
This method has the unique property that General Purpose Bit 3
will not be used in the Local File Header.
This doesn't matter for unzip implementations that conform to the Zip File Spec.
However, 7-Zip 9.20 has a known bug where General Purpose Bit 3
is declared an unsupported compression method
(note that it really has nothing to do with the compression method.).
See issue #11.
If you would like to create zip files that 7-Zip 9.20 can understand,
you must use addBuffer()
instead of addFile()
, addReadStream()
, or addReadStreamLazy()
for all entries in the zip file
(and addEmptyDirectory()
is fine too).
Note that even when yazl provides the file sizes in the Local File Header,
yazl never uses ZIP64 format for Local File Headers due to the size limit on buffer
(see below).
Size limitation on buffer
In order to require the ZIP64 format for a local file header,
the provided buffer
parameter would need to exceed 0xfffffffe
in length.
Alternatively, the buffer
parameter might not exceed 0xfffffffe
in length,
but zlib compression fails to compress the buffer and actually inflates the data to more than 0xfffffffe
in length.
Both of these scenarios are not allowed by yazl, and those are enforced by a size limit on the buffer
parameter.
According to this zlib documentation,
the worst case compression results in "an expansion of at most 13.5%, plus eleven bytes".
Furthermore, some configurations of Node.js impose a size limit of 0x3fffffff
on every Buffer
object.
Running this size through the worst case compression of zlib still produces a size less than 0xfffffffe
bytes,
Therefore, yazl enforces that the provided buffer
parameter must be at most 0x3fffffff
bytes long.
addEmptyDirectory(metadataPath, [options])
Adds an entry to the zip file that indicates a directory should be created,
even if no other items in the zip file are contained in the directory.
This method is only required if the zip file is intended to contain an empty directory.
See addFile()
for info about the metadataPath
parameter.
If metadataPath
does not end with a "/"
, a "/"
will be appended.
options
may be omitted or null and has the following structure and default values:
{
mtime: new Date(),
mode: 040775,
forceDosTimestamp: false,
}
See addFile()
for the meaning of mtime
, mode
, and forceDosTimestamp
.
end([options], [calculatedTotalSizeCallback])
Indicates that no more files will be added via addFile()
, addReadStream()
, addReadStreamLazy()
, addBuffer()
, or addEmptyDirectory()
,
and causes the eventual close of outputStream
.
options
may be omitted or null and has the following structure and default values:
{
forceZip64Format: false,
comment: "",
}
If forceZip64Format
is true
, yazl will include the ZIP64 End of Central Directory Locator
and ZIP64 End of Central Directory Record even if not needed (this may be useful for testing.).
Otherwise, yazl will include these structures if necessary.
If comment
is a string
, it will be encoded with CP437.
If comment
is a Buffer
, it should be a CP437 encoded string.
comment
must be at most 0xffff
bytes in length and must not include the byte sequence [0x50,0x4b,0x05,0x06]
.
This becomes the ".ZIP file comment" field in the end of central directory record.
Note that in practice, most zipfile readers interpret this field in UTF-8 instead of CP437.
If your string uses only codepoints in the range 0x20...0x7e
(printable ASCII, no whitespace except for single space ' '
),
then UTF-8 and CP437 (and ASCII) encodings are all identical.
This restriction is recommended for maxium compatibility.
To use UTF-8 encoding at your own risk, pass a Buffer
into this function; it will not be validated.
If specified and non-null, calculatedTotalSizeCallback
is given the parameters (calculatedTotalSize)
sometime during or after the call to end()
.
calculatedTotalSize
is of type Number
and can either be -1
or the guaranteed eventual size in bytes of the output data that can be read from outputStream
.
Note that calculatedTotalSizeCallback
is usually called well before outputStream
has piped all its data;
this callback does not mean that the stream is done.
(The calculatedTotalSizeCallback
feature was added to this library to support the use case of a web server constructing a zip file dynamically
and serving it without buffering the contents on disk or in ram.
calculatedTotalSize
can become the Content-Length
header before piping the outputStream
as the response body.)
If calculatedTotalSize
is -1
, it means means the total size is too hard to guess before processing the input file data.
To ensure the final size is known, disable compression (set compress: false
or compressionLevel: 0
)
in every call to addFile()
, addReadStream()
, addReadStreamLazy()
, and addBuffer()
,
and additionally specify the optional size
option in every call to addReadStream()
and addReadStreamLazy()
.
The call to calculatedTotalSizeCallback
might be delayed if yazl is still waiting for fs.Stats
for an addFile()
entry.
If addFile()
was never called, calculatedTotalSizeCallback
will be called during the call to end()
.
It is not required to start piping data from outputStream
before calculatedTotalSizeCallback
is called.
calculatedTotalSizeCallback
will be called only once, and only if this is the first call to end()
.
outputStream
A readable stream that will produce the contents of the zip file.
It is typical to pipe this stream to a writable stream created from fs.createWriteStream()
.
Internally, large amounts of file data are piped to outputStream
using pipe()
,
which means throttling happens appropriately when this stream is piped to a slow destination.
Data becomes available in this stream soon after calling one of addFile()
, addReadStream()
, addReadStreamLazy()
, addBuffer()
, or addEmptyDirectory()
.
Clients can call pipe()
on this stream at any time,
such as immediately after getting a new ZipFile
instance, or long after calling end()
.
This stream will remain open while you add entries until you end()
the zip file.
As a reminder, be careful using both .on('data')
and .pipe()
with this stream.
In certain versions of node, you cannot use both .on('data')
and .pipe()
successfully.
dateToDosDateTime(jsDate)
Deprecated since yazl 3.3.0.
This function only remains exported in order to maintain compatibility with older versions of yazl.
It will be removed in yazl 4.0.0 unless someone asks for it to remain supported.
If you ever have a use case for calling this function directly please
open an issue against yazl
requesting that this function be properly supported again.
Regarding ZIP64 Support
yazl automatically uses ZIP64 format to support files and archives over 2^32 - 2
bytes (~4GB) in size
and to support archives with more than 2^16 - 2
(65534) files.
(See the forceZip64Format
option in the API above for more control over this behavior.)
ZIP64 format is necessary to exceed the limits inherent in the original zip file format.
ZIP64 format is supported by most popular zipfile readers, but not by all of them.
Notably, the Mac Archive Utility does not understand ZIP64 format (as of writing this),
and will behave very strangely when presented with such an archive.
Output Structure
The Zip File Spec leaves a lot of flexibility up to the zip file creator.
This section explains and justifies yazl's interpretation and decisions regarding this flexibility.
This section is probably not useful to yazl clients,
but may be interesting to unzip implementors and zip file enthusiasts.
Disk Numbers
All values related to disk numbers are 0
,
because yazl has no multi-disk archive support.
(The exception being the Total Number of Disks field in
the ZIP64 End of Central Directory Locator, which is always 1
.)
Version Made By
Always 0x033f == (3 << 8) | 63
, which means UNIX (3)
and made from the spec version 6.3 (63).
Note that the "UNIX" has implications in the External File Attributes.
Usually 20
, meaning 2.0. This allows filenames and file comments to be UTF-8 encoded.
When ZIP64 format is used, some of the Version Needed to Extract values will be 45
, meaning 4.5.
When this happens, there may be a mix of 20
and 45
values throughout the zipfile.
General Purpose Bit Flag
Bit 11
is always set.
Filenames (and file comments) are always encoded in UTF-8, even if the result is indistinguishable from ascii.
Bit 3
is usually set in the Local File Header.
To support both a streaming input and streaming output api,
it is impossible to know the crc32 before processing the file data.
When bit 3
is set, data Descriptors are given after each file data with this information, as per the spec.
But remember a complete metadata listing is still always available in the central directory record,
so if unzip implementations are relying on that, like they should,
none of this paragraph will matter.
Even so, some popular unzip implementations do not follow the spec.
The Mac Archive Utility requires Data Descriptors to include the optional signature,
so yazl includes the optional data descriptor signature.
When bit 3
is not used, the Mac Archive Utility requires there to be no data descriptor, so yazl skips it in that case.
Additionally, 7-Zip 9.20 does not seem to support bit 3
at all
(see issue #11).
All other bits are unset.
Internal File Attributes
Always 0
.
The "apparently an ASCII or text file" bit is always unset meaning "apparently binary".
This kind of determination is outside the scope of yazl,
and is probably not significant in any modern unzip implementation.
External File Attributes
Always stats.mode << 16
.
This is apparently the convention for "version made by" = 0x03xx
(UNIX).
Note that for directory entries (see addEmptyDirectory()
),
it is conventional to use the lower 8 bits for the MS-DOS directory attribute byte.
However, the spec says this is only required if the Version Made By is DOS,
so this library does not do that.
Directory Entries
When adding a metadataPath
such as "parent/file.txt"
, yazl does not add a directory entry for "parent/"
,
because file entries imply the need for their parent directories.
Unzip clients seem to respect this style of pathing,
and the zip file spec does not specify what is standard in this regard.
In order to create empty directories, use addEmptyDirectory()
.
Size of Local File and Central Directory Entry Metadata
The spec recommends that "The combined length of any directory record and [the file name,
extra field, and comment fields] should not generally exceed 65,535 bytes".
yazl makes no attempt to respect this recommendation.
Instead, each of the fields is limited to 65,535 bytes due to the length of each being encoded as an unsigned 16 bit integer.
Change History
- 3.3.1 (2024-Nov-23)
- Fix bug in
addReadStreamLazy()
timing that could result in the given getReadStreamFunction
being called too soon or being called more than once. issue #87
- 3.3.0 (2024-Nov-08)
- Add support for encoding timestamps in the more modern Info-ZIP "universal timestamp" extended field (
0x5455
aka "UT"
): supports years as old as 1901 instead of only 1980, notably including 1970; encodes timestamp in UTC rather than an unspecified system-dependent local timezone. pull #86 - Disable spending the extra 9 bytes of metadata per entry with
forceDosTimestamp:true
. - Out-of-bounds timestamps are now clamped rather than overflowing/underflowing and wrapping around.
- Marked
dateToDosDateTime()
as deprecated.
- 3.2.1 (2024-Nov-03)
- Fix typo in
addBuffer()
where compressionLevel
wasn't being passed to zlib.
- 3.2.0 (2024-Nov-02)
- Add
compressionLevel
option to addFile()
, addBuffer()
, addReadStream()
, addReadStreamLazy()
. issue #84 - Change wording around
calculatedTotalSizeCallback
again, and fix the documentation incorrectly claiming that addEmptyDirectory()
was relevant in that context.
- 3.1.0 (2024-Oct-19)
- Add
addReadStreamLazy()
as a replacement for addReadStream()
. The latter is maintained for compatibility and situational convenience. issue #74 pull #80 - The
add*()
methods will now throw an error if you call them after calling end()
. The documented semantics have always regarded this as undefined behavior. - Some subtle and undocumented changes to error handling. Error handling isn't very well tested.
- 3.0.1 (2024-Oct-19)
- Change the documented name of
finalSizeCallback
to calculatedTotalSizeCallback
to hopefully avoid confusion. issue #71
- 3.0.0 (2024-Oct-18)
- Upgrade dependencies. This indirectly drops support for older versions of node.
- 2.5.1 (2018-Nov-30)
- Fix support for old versions of Node and add official support for Node versions 0.10, 4, 6, 8, 10. pull #49
- 2.5.0 (2018-Nov-15)
- 2.4.3 (2017-Oct-30)
- 2.4.2 (2016-Oct-26)
- Remove octal literals to make yazl compatible with strict mode. pull #28
- 2.4.1 (2016-Jun-30)
- Fix Mac Archive Utility compatibility issue. issue #24
- 2.4.0 (2016-Jun-26)
- 2.3.1 (2016-May-09)
- Remove
.npmignore
from npm package. pull #22
- 2.3.0 (2015-Dec-23)
metadataPath
can have \
characters now; they will be replaced with /
. issue #18
- 2.2.2 (2015-May-07)
- 2.2.1 (2015-Mar-27)
- Fix Mac Archive Utility compatibility issue. issue #14
- 2.2.0 (2015-Mar-21)
- Avoid using general purpose bit 3 for
addBuffer()
calls. issue #13
- 2.1.3 (2015-Mar-20)
- Fix bug when only addBuffer() and end() are called. issue #12
- 2.1.2 (2015-Mar-03)
- 2.1.1 (2015-Mar-01)
- Fixed stack overflow when using addBuffer() in certain ways. issue #9
- 2.1.0 (2015-Jan-16)
- Added
addEmptyDirectory()
. options
is now optional for addReadStream()
and addBuffer()
.
- 2.0.0 (2014-Sep-24)