
Security News
Package Maintainers Call for Improvements to GitHub’s New npm Security Plan
Maintainers back GitHub’s npm security overhaul but raise concerns about CI/CD workflows, enterprise support, and token management.
epub-wordcount
Advanced tools
Given an epub file, do our best to count the number of words in it.
This package is available on npm as epub-wordcount
. It can be installed using common JS tools:
npm i -g epub-wordcount
On the CLI:
% word-count path/to/book.epub
The Strange Case of Dr. Jekyll and Mr. Hyde
-------------------------------------------
* 26,341 words
In code:
// TS:
import { countWords } from 'epub-wordcount'
// JS:
// const { countWords } = require('epub-wordcount')
countWords('./books/some-book.epub').then((numWords) => {
console.log(`There are ${numWords} words`)
})
// There are 106190 words
There's also a cli tool to quickly get the count of any epub file! Invoke it via:
word-count path/to/file.epub
or
word-count directory/of/books
See word-count -h
for more info
-c, --chars
- Print the alphanumeric character count instead of the world count-r, --raw
- Instead of printing the nice title, just print out a numeral-t, --text
- Print out the whole text of the book. Great for passing into other unix functions, like wc
.--ignore-drm
- If the function is saying your file has DRM when you know it doesn't, you can pass this flag to force the CLI to ignore the DRM warning. Might cause weird results if the actually does have DRM.There are a number of functions exported from this package. Each one takes either a path to a file or an already-parsed file. Mostly you'll use the path, but if the epub you're parsing is in a non-standard format, then you might use that function to ensure the file parses correctly. See here for the options available.
countWords(pathOrEpub, ignoreDrm?) => Promise<number>
countCharacters(pathOrEpub, ignoreDrm?) => Promise<number>
getText(pathOrEpub, ignoreDrm?) => Promise<string[]>
Each of the above can be passed the result of the following:
parseEpubAtPath(path, ignoreDrm?) => Promise<EPub>
There's no programmatic representation for the table of contents in epub and it's hard to skip over the reviews, copyright, etc. An effort is made to only parse the actual story text, but there's a margin of error. Probably no more than ~500 words.
Pull requests welcome.
Unit tests are run on the following e-books:
In modern versions of macOS, dragging a book out of the Books
app won't give you an actual epub- it'll give you a folder with the .epub
extension. Unsurprisingly, this doesn't play well with ePub tooling.
To fix, run the following command (pulled from here) fixes them for me:
# from inside the rogue directory
zip -X -r ../fixed.epub mimetype *
FAQs
Count the number of words in an ebook
The npm package epub-wordcount receives a total of 17 weekly downloads. As such, epub-wordcount popularity was classified as not popular.
We found that epub-wordcount demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Maintainers back GitHub’s npm security overhaul but raise concerns about CI/CD workflows, enterprise support, and token management.
Product
Socket Firewall is a free tool that blocks malicious packages at install time, giving developers proactive protection against rising supply chain attacks.
Research
Socket uncovers malicious Rust crates impersonating fast_log to steal Solana and Ethereum wallet keys from source code.