Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

unicategories

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

unicategories

Unicode category database

  • 0.1.2
  • PyPI
  • Socket score

Maintainers
1

unicategories

Unicode category database, generated and cached on setup.

This module exposes a category dictionary containing RangeGroup instances, containing all unicode category character ranges detected on your system.

Example

from unicategories import categories

upperchars = categories['Lu'].characters()  # iterator
print('Unicode uppercase caracters are "%s"' % ''.join(upperchars))
# Unicode uppercase caracters are "ABCDEF..."

RangeGroup

Immutable iterable (based on tuple, with some useful methods) of (start, end) tuples being, like python's range, open at the end.

This method have been chosen for memory efficiency, storing individually all characters on memory would take a lot of memory.

RangeGroup class provides the following methods:

range_group.characters()

type: () -> typing.Iterator[str]

Get iterator with all characters on this range group.

:returns: iterator of characters (str of size 1)

range_group.codes()

type: () -> typing.Iterator[int]

Get iterator for all unicode code points contained in this range group.

:returns: iterator of character indexes (int)

range_group.has(character)

type: (typing.Union[str, int]) -> bool

Get if character (or character code point) is part of this range group.

:param character: character or unicode code point to look for
:returns: True if character is contained by any range, False otherwise

Unicode categories

Taken from wikipedia.

ValueCategory Major, minorBasic typeCharacter assignedFixedRemarks
LuLetter, uppercaseGraphicCharacter
LlLetter, lowercaseGraphicCharacter
LtLetter, titlecaseGraphicCharacterLigatures containing uppercase followed by lowercase letters (e.g., Dž , Lj , Nj , and Dz )
LmLetter, modifierGraphicCharacter
LoLetter, otherGraphicCharacter
MnMark, nonspacingGraphicCharacter
McMark, spacing combiningGraphicCharacter
MeMark, enclosingGraphicCharacter
NdNumber, decimal digitGraphicCharacterAll these, and only these, have Numeric Type = De
NlNumber, letterGraphicCharacterNumerals composed of letters or letterlike symbols (e.g., Roman numerals )
NoNumber, otherGraphicCharacterE.g., vulgar fractions , superscript and subscript digits
PcPunctuation, connectorGraphicCharacterIncludes "_" underscore
PdPunctuation, dashGraphicCharacterIncludes several hyphen characters
PsPunctuation, openGraphicCharacterOpening bracket characters
PePunctuation, closeGraphicCharacterClosing bracket characters
PiPunctuation, initial quoteGraphicCharacterOpening quotation mark . Does not include the ASCII "neutral" quotation mark. May behave like Ps or Pe depending on usage
PfPunctuation, final quoteGraphicCharacterClosing quotation mark. May behave like Ps or Pe depending on usage
PoPunctuation, otherGraphicCharacter
SmSymbol, mathGraphicCharacter
ScSymbol, currencyGraphicCharacter
SkSymbol, modifierGraphicCharacter
SoSymbol, otherGraphicCharacter
ZsSeparator, spaceGraphicCharacterIncludes the space, but not TAB , CR , or LF , which are Cc
ZlSeparator, lineFormatCharacterOnly U+2028 LINE SEPARATOR (LSEP)
ZpSeparator, paragraphFormatCharacterOnly U+2029 PARAGRAPH SEPARATOR (PSEP)
CcOther, controlControlCharacterFixed 65No name , <control>
CfOther, formatFormatCharacterIncludes the soft hyphen , control characters to support bi-directional text , and language tag characters
CsOther, surrogateSurrogateNot (but abstract)Fixed 2,048No name , <surrogate>
CoOther, private usePrivate-useNot (but abstract)Fixed 137,468 total: 6,400 in BMP , 131,068 in Planes 15–16No name , <private-use>
CnOther, not assignedNoncharacterNotFixed 66No name , <noncharacter>
CnOther, not assignedReservedNotNot fixedNo name , <reserved>

In addition to that, unicategories provide general categories L, M, N, P, S, Z and C.

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc