harb
"Host of Archaic Representations of Books": miscellaneous historical spreadsheet
data formats. Pure-JS cleanroom implementation.
File format support for known spreadsheet formats:
Format | Read | Write |
---|
Excel Supported Text Formats | :-----: | :-----: |
Delimiter-Separated Values (CSV/TSV/DSV) | :o: | |
Data Interchange Format (DIF) | :o: | :o: |
Symbolic Link (SYLK/SLK) | :o: | :o: |
Space-Delimited Text (PRN) | :o: | |
UTF-16 Unicode Text (TXT) | :o: | |
Other Workbook/Worksheet Formats | :-----: | :-----: |
dBASE II/III/IV / Visual FoxPro (DBF) | :o: | |
Other Output Formats | :-----: | :-----: |
SocialCalc | :o: | :o: |
js-harb follows Common Spreadsheet Format.
The objects can be used in conjunction with readers and writers from js-xlsx
and other libraries.
Installation
With npm:
$ npm install harb
Interface
HARB
is the exposed variable in the browser and the exported node variable
HARB.version
is the version of the library (added by the build script).
Parsing functions
HARB.read(data, read_opts)
attempts to parse data
.
HARB.readFile(filename, read_opts)
attempts to read filename
and parse.
Utilities
Utilities are available in the HARB.utils
object:
Exporting:
sheet_to_socialcalc
converts a worksheet object to socialcalc format.
The utilities from js-xlsx work
with the workbook/worksheet objects from js-harb:
sheet_to_json
converts a worksheet object to an array of JSON objects.
sheet_to_row_object_array
is an alias that will be removed in the future.sheet_to_csv
generates delimiter-separated-values output.sheet_to_formulae
generates a list of the formulae (with value fallbacks).
Parsing Options
The exported read
and readFile
functions accept an options argument:
Option Name | Default | Description |
---|
dateNF | "" | override the date format |
File Formats
Comma-Separated Values (CSV)
The current version leans on BabyParse for reading CSV and other formats. Note
that the reader is RFC4180 compliant, so it does not support all of Excel's CSV
import features.
UTF-16 Unicode Text (TXT)
The text is really UTF-16 encoded TSV. Decoding provided by js-codepage
Space-Delimited Text (PRN)
The output represents a "display" output. There is no proper delimiter.
The current implementation guesses the first column by searching for the first
blank space in every line and taking the largest one as the column width.
Subsequent rows are assumed to be 10 characters wide.
Symbolic Link (SYLK/SLK)
Symbolic Link is one of the original Microsoft Excel formats, actually dating
back to MultiPlan. Unlike the modern formats, no official specification was
released. Due to the plaintext record format and write support in the latest
versions of Excel, it is somewhat straightforward to use specially-crafted test
files to understand the format.
dBASE II/III/IV / FoxBase / Visual FoxPro (DBF)
dBASE and FoxPro file formats are simple binary formats for storing data tables.
The reader adds the field headers as the first row of the output worksheet.
Technically the reader supports files generated by dBASE up to version 7, but
Excel does not support many of the newer features.
SocialCalc
SocialCalc format is a plaintext single-sheet format used in Ethercalc with a
record format that harkens back to SYLK.
Test Files
Test files are housed in another repo.
Running make init
will refresh the test_files
submodule and get the files.
Contributing
Due to the precarious nature of the Open Specifications Promise, it is very
important to ensure code is cleanroom. Consult CONTRIBUTING.md
The harb.js file is constructed from the files in the bits
subdirectory. The
build script (run make
) will concatenate the individual bits to produce the
script. Before submitting a contribution, ensure that running make will produce
the harb.js file exactly. The simplest way to test is to move the script:
$ mv harb.js harb.new.js
$ make
$ diff harb.js harb.new.js
To produce the dist files, run make dist
. The dist files are updated in each
version release and should not be committed between versions.
Additional Support
Additional support is available in js-xlsx.
License
Please consult the attached LICENSE file for details. All rights not explicitly
granted by the Apache 2.0 license are reserved by the Original Author.
It is the opinion of the Original Author that this code conforms to the terms of
the Microsoft Open Specifications Promise, falling under the same terms as
OpenOffice (which is governed by the Apache License v2). Given the vagaries of
the promise, the Original Author makes no legal claim that in fact end users are
protected from future actions. It is highly recommended that, for commercial
uses, you consult a lawyer before proceeding.
References
No official specification exists for many of these formats. For some formats, a
"reference implementation" is the specification. When implementations disagree,
Excel's interpretation is assumed to be correct (unless Excel does not support
the format, in which case the application that introduced the format is assumed
to be correct).
Badges