Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More β†’
Socket
Sign inDemoInstall
Socket

kre

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

kre

Subcharacter-level regular expression functionality for Korean

  • 0.9.9
  • PyPI
  • Socket score

Maintainers
1

kre

Subcharacter-level regular expressions with Korean text.

kre is a wrapper for re from the Python Standard Library which allows users to apply the full functionality of re at the subcharacter level for Korean text.

Installation

kre releases are available on PyPI.

pip install kre

Documentation

Most functionality is documented in the re documentation.

Documentation on the unique features of kre is available in the wiki, where you will also find discussion of inherent differences between re (character-level regular expressions) and kre (subcharacter-level regular expressions) and how kre addresses them. It is strongly recommended that users familiarize themselves with these differences.

Example Features

In the simple case of search functions, matches are mapped back to their original position.

> re.search(r"γ…‘", "ν•œκΈ€") # no match
> kre.search(r"γ…‘", "ν•œκΈ€")
<kre.KRE_Match object; span=(1, 2), match='κΈ€'>

In the case of subcharacter-level substitutions, kre can recombine any newly created sequences into standard Korean characters, provided the input used standard (syllable) characters.

> kre.sub(r"ㅏ", r"γ…—", "ν•³γ…ν•˜γ…Žγ…ν•˜ν•³")
'ν™“γ…—ν˜Έγ…Žγ…—ν˜Έν™“'

If you prefer, kre can also attempt to merge non-standard input with substitutions.

> kre.sub(r"ㅏ", r"γ…—", "ν•³γ…ν•˜γ…Žγ…ν•˜ν•³", syllabify="extended")
'ν˜Έν˜Έν˜Έν˜Έν˜Έν™“'

Although linearizing a Korean string normally results in the loss of information about syllable boundaries, kre makes it possible to make use of syllable boundaries in regular expression patterns through the use of (customizable) syllable delimiters (';' by default).

> kre.search(r"γ…‡", "생일 μΆ•ν•˜ν•΄~")
<kre.KRE_Match object; span=(0, 1), match='생'>
> kre.search(r";γ…‡", "생일 μΆ•ν•˜ν•΄~", boundaries=True)
<kre.KRE_Match object; span=(1, 2), match='일'>

As a more interesting, complicated, and perhaps useless example of what kre can do, the following swaps every sequential pair of final consonant(s) (λ°›μΉ¨) in the input string.

> sun_and_moon = "μ˜›λ‚  μ˜›μ  κΉŠμ€ μ‚° 속에 κ°€λ‚œν•˜μ§€λ§Œ 사이쒋은 μ˜€λˆ„μ΄μ™€ κ·Έ ν™€μ–΄λ¨Έλ‹ˆ 가쑱이 μ‚΄κ³  μžˆμ—ˆλ‹€."
> kre.sub(r"([ㅏ-γ…£])([γ„±-γ…Ž]{1,2};)(.*?)([ㅏ-γ…£])([γ„±-γ…Ž]{1,2};)", r"\1\5\3\4\2", sun_and_moon, boundaries=True)
'μ˜λ‚« μ˜‰μ “ 긴읖 μ‚­ 손에 κ°€λ‚œν•˜μ§€λ§Œ 사이쑴읗 μ˜€λˆ„μ΄μ™€ κ·Έ ν˜Ήμ–΄λ¨Έλ‹ˆ 가쑸이 샀고 μΌμ—ˆλ‹€.'

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚑️ by Socket Inc