What is cfb?
The cfb npm package is a library designed for handling CFB (Compound File Binary) files, also known as Microsoft Compound Document File Format. This format is commonly used in older Microsoft Office documents like .doc, .xls, and .ppt files. The package allows for the creation, manipulation, and extraction of data from these files.
What are cfb's main functionalities?
Reading CFB files
This code demonstrates how to read a CFB file from the filesystem. It uses the `read` method of the cfb package to load a file named 'test.xls' and logs the resulting data structure to the console.
const CFB = require('cfb');
const cfb = CFB.read('test.xls', {type: 'file'});
console.log(cfb);
Creating CFB files
This example shows how to create a new CFB file with a file named 'newfile.txt' inside it. It demonstrates creating a new CFB structure, adding a file to it, and then writing the CFB structure to a file named 'output.cfb'.
const CFB = require('cfb');
const cfb = CFB.utils.cfb_new();
CFB.utils.cfb_add(cfb, 'newfile.txt', new Uint8Array([1, 2, 3, 4, 5]));
CFB.write(cfb, 'output.cfb');
Extracting files from CFB containers
This snippet illustrates how to extract a file from a CFB container. It reads a CFB file named 'container.cfb', searches for a file named '/WordDocument' within the container, and logs its content.
const CFB = require('cfb');
const cfb = CFB.read('container.cfb', {type: 'file'});
const fileContent = CFB.find(cfb, '/WordDocument');
console.log(fileContent);
Other packages similar to cfb
js-xlsx
js-xlsx is a comprehensive library for parsing and writing spreadsheets in various formats including XLSX/XLSM/XLSB/XLS/ODS. It offers broader functionality for spreadsheet manipulation compared to cfb, which is focused on the CFB file format.
Compound File Binary Format
This is a Pure-JS implementation of MS-CFB: Compound File Binary File Format, a
format used in many Microsoft file types (such as XLS and DOC)
Utility Installation and Usage
With npm:
$ npm install -g cfb
$ cfb path/to/CFB/file
The command will extract the storages and streams in the container, generating
files that line up with the tree-based structure of the storage. Metadata such
as the red-black tree are discarded. The -l
option displays a manifest.
Library Installation and Usage
In the browser:
<script src="cfb.js" type="text/javascript"></script>
In node:
var CFB = require('cfb');
For example, to get the Workbook content from an XLS file:
var cfb = CFB.read(filename, {type: 'file'});
var workbook = cfb.find('Workbook');
var data = workbook.content;
The xlscfb.js
file is designed to be embedded in js-xlsx
API
Typescript definitions are maintained in types/index.d.ts
.
The CFB object exposes the following methods and properties:
CFB.parse(blob)
takes a nodejs Buffer or an array of bytes and returns an
parsed representation of the data.
CFB.read(blob, options)
wraps parse
. options.type
controls the behavior:
file
: blob
should be a file namebase64
: blob
should be a base64 stringbinary
: blob
should be a binary string
CFB.find(cfb, path)
performs a case-insensitive match for the path (or file
name, if there are no slashes) and returns an entry object or null if not found.
Container Object Description
The object returned by parse
and read
can be found in the source (rval
).
It has the following properties and methods:
-
.find(path)
is equivalent to CFB.find(cfb, path)
and should not be used.
-
.FullPaths
is an array of the names of all of the streams (files) and
storages (directories) in the container. The paths are properly prefixed from
the root entry (so the entries are unique)
-
.FullPathDir
is an object whose keys are entries in .FullPaths
and whose
values are objects with metadata and content (described below)
-
.FileIndex
is an array of the objects from .FullPathDir
, in the same order
as .FullPaths
.
-
.raw
contains the raw header and sectors
Entry Object Description
The entry objects are available from FullPathDir
and FileIndex
elements of
the container object.
.name
is the (case sensitive) internal name.type
is the type as defined in "Object Type" in [MS-CFB] 2.6.1:
2 (stream)
for files, 1 (storage)
for dirs, 5 (root)
for root).content
is a Buffer/Array with the raw content.ct
/.mt
are the creation and modification time (if provided in file)
License
This implementation is covered under Apache 2.0 license. It complies with the
Open Specifications Promise