
Security News
Vite Releases Technical Preview of Rolldown-Vite, a Rust-Based Bundler
Vite releases Rolldown-Vite, a Rust-based bundler preview offering faster builds and lower memory usage as a drop-in replacement for Vite.
See what word count gray areas might be affecting your word count.
Word Count Analyzer is a Ruby gem that analyzes a string for potential areas of the text that might cause word count discrepancies depending on the tool used. It also provides comprehensive configuration options so you can easily customize how different gray areas should be counted and find the right word count for your purposes.
If you prioritize speed over accuracy, then I recommend not using this gem. There are most definitely faster gems for getting a word count. However, if accuracy is important, and you want control over the gray areas that affect word count, then this gem is for you.
##Install
Ruby
Supports Ruby 2.1.0 and above
gem install word_count_analyzer
Ruby on Rails
Add this line to your application’s Gemfile:
gem 'word_count_analyzer'
##Live Demo
Try out a live demo of Word Count Analyzer in the browser.
Common word count gray areas include (more details below):
Other gray areas not covered by this gem:
text = "This string has a date: Monday, November 3rd, 2011. I was thinking... it also shouldn't have too many contractions, maybe 4. <html> Some HTML and a hyphenated-word</html>. Don't count stray punctuation ? ? ? Please visit the ____________ ------------ ........ go-to site: https://www.example-site.com today. Let's add a list 1. item a 2. item b 3. item c. Now let's add he/she/it or a c:\\Users\\john. 2/15/2012 is the date! { HYPERLINK 'http://www.hello.com' }"
WordCountAnalyzer::Analyzer.new.analyze(text)
# => {
# "ellipsis": 1,
# "hyperlink": 2,
# "contraction": 4,
# "hyphenated_word": 2,
# "date": 2,
# "number": 1,
# "numbered_list": 3,
# "xhtml": 1,
# "forward_slash": 1,
# "backslash": 1,
# "dotted_line": 1,
# "dashed_line": 1,
# "underscore": 1,
# "stray_punctuation": 5
# }
text = "This string has a date: Monday, November 3rd, 2011. I was thinking... it also shouldn't have too many contractions, maybe 2. <html> Some HTML and a hyphenated-word</html>. Don't count punctuation ? ? ? Please visit the ____________ ------------ ........ go-to site: https://www.example-site.com today. Let's add a list \n\n1. item a \n\n2. item b \n\n3. item c. Now let's add he/she/it or a c:\\Users\\john. 2/15/2012 is the date! { HYPERLINK 'http://www.hello.com' }"
WordCountAnalyzer::Counter.new.count(text)
# => 64
# Overrides all settings to match the way Pages handles word count.
# N.B. The developers of Pages may change the algorithm at any time so this should just be as an approximation.
WordCountAnalyzer::Counter.new.pages_count(text)
# => 76 (or 79 if the list items are not formatted as a list)
# Overrides all settings to match the way Microsoft Word and wc (Unix) handle word count.
# N.B. The developers of these tools may change the algorithm at any time so this should just be as an approximation.
WordCountAnalyzer::Counter.new.mword_count(text)
# => 71
# Highly configurable (see all options below)
WordCountAnalyzer::Counter.new(
ellipsis: 'no_special_treatment',
hyperlink: 'no_special_treatment',
contraction: 'count_as_multiple',
hyphenated_word: 'count_as_multiple',
date: 'count_as_one',
number: 'ignore',
numbered_list: 'ignore',
xhtml: 'keep',
forward_slash: 'count_as_multiple',
backslash: 'count_as_multiple',
dotted_line: 'count',
dashed_line: 'count',
underscore: 'count',
stray_punctuation: 'count'
).count(text)
# => 77
options
ellipsis
default = 'ignore'
'ignore'
'no_special_treatment'
hyperlink
default = 'count_as_one'
'count_as_one'
'no_special_treatment'
'split_at_period'
contraction
default = 'count_as_one'
'count_as_one'
'count_as_multiple'
don't
=> do not
(2 words)o'clock
=> of the clock
(3 words)hyphenated_word
default = 'count_as_one'
'count_as_one'
'count_as_multiple'
devil-may-care
(3 words)date
default = 'no_special_treatment'
'no_special_treatment'
'count_as_one'
number
default = 'count'
'count'
'ignore'
dates
and numbered_lists
) and does not count them towards the word count.numbered_list
default = 'count'
'count'
'ignore'
xhtml
default = 'remove'
'remove'
'keep'
forward_slash
default = 'count_as_multiple_except_dates'
'count_as_multiple_except_dates'
'count_as_multiple'
'count_as_one'
backslash
default = 'count_as_one'
'count_as_one'
'count_as_multiple'
dotted_line
default = 'ignore'
'ignore'
'count'
dashed_line
default = 'ignore'
'ignore'
'count'
underscore
default = 'ignore'
'ignore'
'count'
stray_punctuation
default = 'ignore'
'ignore'
'count'
Checks for any occurrences of ellipses in your text. Writers tend to use different formats for ellipsis, and although there are style guides, it is rare that these rules are followed.
...
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 0 |
wc (Unix) | 1 |
....
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 0 |
wc (Unix) | 1 |
. . .
Tool | Word Count |
---|---|
Microsoft Word | 3 |
Pages | 0 |
wc (Unix) | 3 |
. . . .
Tool | Word Count |
---|---|
Microsoft Word | 4 |
Pages | 0 |
wc (Unix) | 4 |
…
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 0 |
wc (Unix) | 1 |
http://www.example.com
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 4 |
wc (Unix) | 1 |
Most tools count contractions as one word. Some might argue a contraction is technically more than one word.
can't
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 1 |
wc (Unix) | 1 |
devil-may-care
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 3 |
wc (Unix) | 1 |
Most word processing tools do not do recognize dates, but translation CAT tools tend to recognize dates as one word or placeable. This gem checks for many date formats including those that include day or month abbreviations. A few examples are listed below (not an exhaustive list).
Monday, April 4th, 2011
Tool | Word Count |
---|---|
Microsoft Word | 4 |
Pages | 4 |
wc (Unix) | 4 |
04/04/2011
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 3 |
wc (Unix) | 1 |
04.04.2011
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 1 |
wc (Unix) | 1 |
200
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 1 |
wc (Unix) | 1 |
$200
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 1 |
wc (Unix) | 1 |
50%
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 1 |
wc (Unix) | 1 |
1. List item a
2. List item b
3. List item c
Tool | Word Count |
---|---|
Microsoft Word | 12 |
Pages | 9 |
wc (Unix) | 12 |
<span class="large-text">Hello world</span> <new-tag>Hello</new-tag>
Tool | Word Count |
---|---|
Microsoft Word | 4 |
Pages | 12 |
wc (Unix) | 4 |
she/he/it
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 3 |
wc (Unix) | 1 |
c:\Users\johndoe
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 3 |
wc (Unix) | 1 |
.........
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 0 |
wc (Unix) | 1 |
………………………
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 0 |
wc (Unix) | 1 |
-----------
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 0 |
wc (Unix) | 1 |
____________
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 0 |
wc (Unix) | 1 |
:
Tool | Word Count |
---|---|
Microsoft Word | 1 |
Pages | 0 |
wc (Unix) | 1 |
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)The MIT License (MIT)
Copyright (c) 2015 Kevin S. Dias
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
FAQs
Unknown package
We found that word_count_analyzer demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Vite releases Rolldown-Vite, a Rust-based bundler preview offering faster builds and lower memory usage as a drop-in replacement for Vite.
Research
Security News
A malicious npm typosquat uses remote commands to silently delete entire project directories after a single mistyped install.
Research
Security News
Malicious PyPI package semantic-types steals Solana private keys via transitive dependency installs using monkey patching and blockchain exfiltration.