@ridi/epub-parser
Common EPUB2 data parser for Ridibooks services written in ES6
Features
- Detailed parsing for EPUB2
- Supports package validation, decompression and style extraction with various parsing options
- Extract files within EPUB with various reading options
TODO
Install
npm install @ridi/epub-parser
Usage
import EpubParser from '@ridi/epub-parser';
const parser = new EpubParser('./foo/bar.epub' or './unzippedPath');
parser.parse().then((book) => {
parser.readItems(book.spines).then((results) => {
...
});
...
});
API
parse(parseOptions)
Returns Promise<Book>
with:
- Book: Instance with metadata, spine list, table of contents, etc.
Or throw exception.
readItem(item, readOptions)
Returns ReadResult or throw exception.
readItems(items, readOptions)
Returns ReadResult[] or throw exception.
items: Item[]
(see: Item Types)
ReadResult
Model
- name: ?string
- role: string (Default: Author.Roles.UNDEFINED)
- value: ?string
- event: string (Default: DateTime.Events.UNDEFINED)
- value: ?string
- scheme: string (Default: Identifier.Schemes.UNDEFINED)
- name: ?string
- content: ?string
- title: ?string
- type: string (Default: Guide.Types.UNDEFINED)
- href: ?string
- item: ?Item
Item Types
- id: ?id
- href: ?string
- mediaType: ?string
- size: ?number
- isFileExists: boolean (size !== undefined)
- spineIndex: number (Default: -1)
- isLinear: boolean (Default: true)
- styles: ?CssItem[]
- text: string (Default: '')
- isCover: boolean (Default: false)
- reason: string (Default: DeadItem.Reason.UNDEFINED)
- id: ?string
- label: ?string
- src: ?string
- anchor: ?string
- depth: number (Default: 0)
- children: NavPoint[]
- spine: ?SpineItem
- major: number
- minor: number
- patch: number
- isValid: boolean (Only 2.x.x is valid because current epub-parser only supports EPUB2.)
- toString(): string
Parse Options
validatePackage: boolean
If true, validation package specifications in IDPF listed below.
- Zip header should not corrupt.
mimetype
file must be first file in archive.mimetype
file should not compressed.mimetype
file should only contain string application/epub+zip
.- Should not use extra field feature of ZIP format for mimetype file.
Default: false
validateXml: boolean
If true, stop parsing when XML parsing errors occur.
Default: false
allowNcxFileMissing: boolean
If false, stop parsing when NCX file not exists.
Default: true
unzipPath: ?string
If specified, uncompress to that path.
Only if input is EPUB file.
Default: undefined
overwrite: boolean
If true, overwrite to unzipPath when uncompress.
Default: true
ignoreLinear: boolean
If true, ignore spineIndex
difference caused by isLinear
property of SpineItem.
[{ spineIndex: 0, isLinear: true, ... }, [{ spineIndex: 0, isLinear: true, ... },
{ spineIndex: 1, isLinear: true, ... }, { spineIndex: 1, isLinear: true, ... },
{ spineIndex: -1, isLinear: false, ... }, { spineIndex: 2, isLinear: false, ... },
{ spineIndex: 2, isLinear: true, ... }] { spineIndex: 3, isLinear: true, ... }]
Default: true
useStyleNamespace: boolean
If true, One namespace is given per CSS file or inline style, and styles used for spine is described.
Otherwise it CssItem.namespace
, SpineItem.styles
is undefined
.
In any list, InlineCssItem is always positioned after CssItem. (Book.styles
, Book.items
, SpineItem.styles
, ...)
Default: false
styleNamespacePrefix: string
Prepend given string to namespace for identification.
Default: 'ridi_style'
Read Options
basePath: ?string
If specified, change base path of paths used by spine and css.
HTML: SpineItem
...
<div>
<img src="../Images/cover.jpg">
</div>
<div>
<img src="{basePath}/OEBPS/Images/cover.jpg">
</div>
...
CSS: CssItem, InlineCssItem
@font-face {
font-family: NotoSansRegular;
src: url("../Fonts/NotoSans-Regular.ttf");
}
@font-face {
font-family: NotoSansRegular;
src: url("{basePath}/OEBPS/Fonts/NotoSans-Regular.ttf");
}
Default: undefined
spine.extractBody: boolean
If true, extract body. Otherwise it returns a full string.
true:
{
body: '\n <p>Extract style</p>\n <img src=\"../Images/api-map.jpg\"/>\n',
attrs: [
{
key: 'style',
value: 'background-color: #000000;',
},
{
key: 'class',
value: '.ridi_style2, .ridi_style3, .ridi_style4, .ridi_style0, .ridi_style1',
},
],
}
false:
'<!doctype><html>\n<head>\n</head>\n<body style="background-color: #000000;">\n <p>Extract style</p>\n <img src=\"../Images/api-map.jpg\"/>\n</body>\n</html>'
Default: false
If specified, transforms output of extractBody.
Define adapter:
const extractAdapter = (body, attrs) => {
let string = '';
attrs.forEach((attr) => {
string += ` ${attr.key}=\"${attr.value}\"`;
});
return {
content: `<article${string}>${body}</article>`,
};
};
Result:
{
content: '<article style=\"background-color: #000000;\" class=\".ridi_style2, .ridi_style3, .ridi_style4, .ridi_style0, .ridi_style1\">\n <p>Extract style</p>\n <img src=\"../Images/api-map.jpg\"/>\n</article>',
}
Default: defaultExtractAdapter
spine.useCssOptions: boolean
If true, applies readOptions.css to inline styles and style attributes.
Default: false
css.removeAtrules: string[]
Remove at-rules.
Default: []
css.removeTags: string[]
Remove selector that point to specified tags.
Default: []
css.removeIds: string[]
Remove selector that point to specified ids.
Default: []
css.removeClasses: string[]
Remove selector that point to specified classes.
Default: []