![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
@ov/grapheme-breaker
Advanced tools
An implementation of the Unicode grapheme cluster breaking algorithm (UAX #29)
A JavaScript implementation of the Unicode grapheme cluster breaking algorithm (UAX #29)
It is important to recognize that what the user thinks of as a “character”—a basic unit of a writing system for a language—may not be just a single Unicode code point. Instead, that basic unit may be made up of multiple Unicode code points. To avoid ambiguity with the computer use of the term character, this is called a user-perceived character. For example, “G” + acute-accent is a user-perceived character: users think of it as a single character, yet is actually represented by two Unicode code points. These user-perceived characters are approximated by what is called a grapheme cluster, which can be determined programmatically.
You can install via npm
npm install grapheme-breaker
var GraphemeBreaker = require('grapheme-breaker');
// break a string into an array of grapheme clusters
GraphemeBreaker.break('Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞') // => ['Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍', 'A̴̵̜̰͔ͫ͗͢', 'L̠ͨͧͩ͘', 'G̴̻͈͍͔̹̑͗̎̅͛́', 'Ǫ̵̹̻̝̳͂̌̌͘', '!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞']
// or just count the number of grapheme clusters in a string
GraphemeBreaker.countBreaks('Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞') // => 6
// use nextBreak and previousBreak to get break points starting
// from anywhere in the string
GraphemeBreaker.nextBreak('😜🇺🇸👍', 3) // => 6
GraphemeBreaker.previousBreak('😜🇺🇸👍', 3) // => 2
In order to use the library, you shouldn't need to know this, but if you're interested in contributing or fixing bugs, these things might be of interest.
The src/classes.coffee
file is automatically generated from GraphemeBreakProperty.txt
in the Unicode
database by src/generate_data.coffee
. It should be rare that you need to run this, but
you may if, for instance, you want to change the Unicode version.
You can run the tests using npm test
. They are written using mocha
, and generated from
GraphemeBreakTest.txt
and emoji-test.txt
from the Unicode database, which is included in the
repository for performance reasons while running them.
MIT
FAQs
An implementation of the Unicode grapheme cluster breaking algorithm (UAX #29)
We found that @ov/grapheme-breaker demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.