tiptap-word-count-multilingual
Background
When counting words, the official Tiptap CharacterCount extension uses a simple method: it splits the text using text.split(' ') and counts the resulting array's length. This approach only works for languages where words are separated by spaces.
However, this method fails to accurately count words in languages like those using CJK characters, where words are not separated by spaces.
What this extension does
This extension addresses this issue by leveraging Alfaaz. As described on its page:
Alfaaz is the fastest multilingual word counter that can count millions of words per second (up to 0.9 GB/s 100x faster than RegExp based solutions). It has built-in support for CJK texts & words in many different languages such as Urdu & Arabic.
Compared to the original extension, this extension introduces one single change:
this.storage.words = options => {
const node = options?.node || this.editor.state.doc
const text = node.textBetween(0, node.content.size, ' ', ' ')
const words = text.split(' ').filter(word => word !== '')
return words.length
}
import { countWords } from 'alfaaz'
this.storage.words = options => {
const node = options?.node || this.editor.state.doc
const text = node.textBetween(0, node.content.size, ' ', ' ')
return countWords(text)
}
All other aspects remain identical, allowing for a seamless replacement of the original extension without significant modifications.
Installation & Usage
Install
npm install tiptap-word-count-multilingual
Usage
import WordCount from "tiptap-word-count-multilingual";
const editor = new Editor ({
extensions: [
Document,
Paragraph,
Text,
WordCount
]
})
editor.storage.characterCount.words()
editor.storage.characterCount.words({ node: someCustomNode })
Since this extension only modifies the word counting mechanism, all settings and storage functionalities from the official extension should still work the same way.
If you are migrating from the official extension, simply update the import statement.