New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

@ridi/epub-parser

Package Overview
Dependencies
Maintainers
12
Versions
81
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@ridi/epub-parser

Common EPUB2 data parser for Ridibooks services written in ES6

0.0.2
Source
npm
Version published
Maintainers
12
Created
Source

@ridi/epub-parser

Common EPUB2 data parser for Ridibooks services written in ES6

npm version Build Status codecov

Features

  • Detailed parsing for EPUB2
  • Supports package validation, decompression and style extraction with various parsing options
  • Extract files within EPUB with various reading options

TODO

  • Add encryption and decryption function
  • Add readOptions.spine.serializedAnchor option
  • Add readOptions.spine.truncate and readOption.spine.truncateMaxLength options
  • Add readOptions.spine.minify and readOptions.css.minify options
  • Add readOptions.removeExternalRefs option
  • Support for EPUB3
  • Support for CLI
  • Support for other OCF spec (manifest.xml, metadata.xml, signatures.xml, encryption.xml, etc)

Install

npm install @ridi/epub-parser

Usage

import EpubParser from '@ridi/epub-parser';

const parser = new EpubParser('./foo/bar.epub' or './unzippedPath');
parser.parse().then((book) => {
  parser.readItems(book.spines).then((results) => {
    ...
  });
  ...
});

API

parse(parseOptions)

Returns Promise<Book> with:

  • Book: Instance with metadata, spine list, table of contents, etc.

Or throw exception.

parseOptions: object

readItem(item, readOptions)

Returns string or object or Buffer in Promise (see detail) or throw exception.

item: Item (see: Item Types)
readOptions: object

readItems(items, readOptions)

Returns string[] or object[] or Buffer[] in Promise (see detail) or throw exception.

items: Item[] (see: Item Types)
readOptions: object

Returns detail

Model

Book

Author

  • name: string?
  • role: string (Default: Author.Roles.UNDEFINED)

DateTime

  • value: strung?
  • event: string (Default: DateTime.Events.UNDEFINED)

Identifier

  • value: string?
  • scheme: string? (Default: Identifier.Schemes.UNDEFINED)

Meta

  • name: string?
  • content: string?

Guide

  • title: string?
  • type: string (Default: Guide.Types.UNDEFINED)
  • href: string?
  • item: Item?

Item Types

Item
  • id: id?
  • href: string?
  • mediaType: string?
  • size: number?
  • isFileExists: boolean (size !== undefined)

NcxItem (extend Item)

SpineItem (extend Item)
  • spineIndex: number (Default: -1)
  • isLinear: boolean (Default: true)
  • styles: CssItem[]?

CssItem (extend Item)
  • namespace: string?

InlineCssItem (extend CssItem)
  • text: string?

ImageItem (extend Item)
  • isCover: boolean (Default: false)

SvgItem (extend ImageItem)

FontItem (extend Item)

DeadItem (extend Item)
  • reason: string (Default: DeadItem.Reason.UNDEFINED)

NavPoint

  • id: string?
  • label: string?
  • src: string?
  • anchor: string?
  • depth: number (Default: 0)
  • children: NavPoint[]
  • spine: SpineItem?

Parse Options

validatePackage: boolean

If true, validation package specifications in IDPF listed below.

  • Zip header should not corrupt.
  • mimetype file must be first file in archive.
  • mimetype file should not compressed.
  • mimetype file should only contain string application/epub+zip.
  • Should not use extra field feature of ZIP format for mimetype file.

Default: false

validateXml: boolean

If true, stop parsing when XML parsing errors occur.

Default: false

allowNcxFileMissing: boolean

If false, stop parsing when NCX file not exists.

Default: true

unzipPath: string?

If specified, uncompress to that path.

Only if input is EPUB file.

Default: undefined

createIntermediateDirectories: boolean

If true, creates intermediate directories for unzipPath.

Default: true

removePreviousFile: boolean

If true, removes a previous file from unzipPath.

Default: true

ignoreLinear: boolean

If true, ignore spineIndex difference caused by isLinear property of SpineItem.

// e.g. If left is false, right is true.
[{ spineIndex: 0, isLinear: true, ... },       [{ spineIndex: 0, isLinear: true, ... },
{ spineIndex: 1, isLinear: true, ... },        { spineIndex: 1, isLinear: true, ... },
{ spineIndex: -1, isLinear: false, ... },      { spineIndex: 2, isLinear: false, ... },
{ spineIndex: 2, isLinear: true, ... }]        { spineIndex: 3, isLinear: true, ... }]

Default: true

useStyleNamespace: boolean

If true, One namespace is given per CSS file or inline style, and styles used for spine is described.

Otherwise it CssItem.namespace, SpineItem.styles is undefined.

In any list, InlineCssItem is always positioned after CssItem. (Book.styles, Book.items, SpineItem.styles, ...)

Default: false

styleNamespacePrefix: string

Prepend given string to namespace for identification.

Default: 'ridi_style'

Read Options

basePath: string?

If specified, change base path of paths used by spine and css.

HTML: SpineItem

...
  <!-- Before -->
  <div>
    <img src="../Images/cover.jpg">
  </div>
  <!-- After -->
  <div>
    <img src="{basePath}/OEBPS/Images/cover.jpg">
  </div>
...

CSS: CssItem, InlineCssItem

/* Before */
@font-face {
  font-family: NotoSansRegular;
  src: url("../Fonts/NotoSans-Regular.ttf");
}
/* After */
@font-face {
  font-family: NotoSansRegular;
  src: url("{basePath}/OEBPS/Fonts/NotoSans-Regular.ttf");
}

Default: undefined

spine.extractBody: boolean

If true, extract body. Otherwise it returns a full string.

true:

{
  body: '\n  <p>Extract style</p>\n  <img src=\"../Images/api-map.jpg\"/>\n',
  attrs: [
    {
      key: 'style',
      value: 'background-color: #000000;',
    },
    { // Only added if useStyleNamespace is true.
      key: 'class',
      value: '.ridi_style2, .ridi_style3, .ridi_style4, .ridi_style0, .ridi_style1',
    },
  ],
}

false:

'<!doctype><html>\n<head>\n</head>\n<body style="background-color: #000000;">\n  <p>Extract style</p>\n  <img src=\"../Images/api-map.jpg\"/>\n</body>\n</html>'

Default: false

spine.extractAdapter: function

If specified, transforms output of extractBody.

Define adapter:

const extractAdapter = (body, attrs) => {
  let string = '';
  attrs.forEach((attr) => {
    string += ` ${attr.key}=\"${attr.value}\"`;
  });
  return {
    content: `<article${string}>${body}</article>`,
  };
};

Result:

{
  content: '<article style=\"background-color: #000000;\" class=\".ridi_style2, .ridi_style3, .ridi_style4, .ridi_style0, .ridi_style1\">\n  <p>Extract style</p>\n  <img src=\"../Images/api-map.jpg\"/>\n</article>',
}

Default: defaultExtractAdapter

css.removeAtrules: string[]

Remove at-rules.

Default: ['charset', 'import', 'keyframes', 'media', 'namespace', 'supports']

css.removeTags: string[]

Remove selector that point to specified tags.

Default: []

css.removeIds: string[]

Remove selector that point to specified ids.

Default: []

css.removeClasses: string[]

Remove selector that point to specified classes.

Default: []

Keywords

EPUB

FAQs

Package last updated on 11 Sep 2018

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts