
Security News
Browserslist-rs Gets Major Refactor, Cutting Binary Size by Over 1MB
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Subcharacter-level regular expressions with Korean text.
kre is a wrapper for re
from the Python Standard Library which allows users to apply the full functionality of re
at the subcharacter level for Korean text.
kre releases are available on PyPI.
pip install kre
Most functionality is documented in the re documentation.
Documentation on the unique features of kre is available in the wiki, where you will also find discussion of inherent differences between re
(character-level regular expressions) and kre
(subcharacter-level regular expressions) and how kre addresses them. It is strongly recommended that users familiarize themselves with these differences.
In the simple case of search functions, matches are mapped back to their original position.
> re.search(r"γ
‘", "νκΈ") # no match
> kre.search(r"γ
‘", "νκΈ")
<kre.KRE_Match object; span=(1, 2), match='κΈ'>
In the case of subcharacter-level substitutions, kre can recombine any newly created sequences into standard Korean characters, provided the input used standard (syllable) characters.
> kre.sub(r"γ
", r"γ
", "ν³γ
νγ
γ
νν³")
'νγ
νΈγ
γ
νΈν'
If you prefer, kre can also attempt to merge non-standard input with substitutions.
> kre.sub(r"γ
", r"γ
", "ν³γ
νγ
γ
νν³", syllabify="extended")
'νΈνΈνΈνΈνΈν'
Although linearizing a Korean string normally results in the loss of information about syllable boundaries, kre makes it possible to make use of syllable boundaries in regular expression patterns through the use of (customizable) syllable delimiters (';' by default).
> kre.search(r"γ
", "μμΌ μΆνν΄~")
<kre.KRE_Match object; span=(0, 1), match='μ'>
> kre.search(r";γ
", "μμΌ μΆνν΄~", boundaries=True)
<kre.KRE_Match object; span=(1, 2), match='μΌ'>
As a more interesting, complicated, and perhaps useless example of what kre can do, the following swaps every sequential pair of final consonant(s) (λ°μΉ¨) in the input string.
> sun_and_moon = "μλ μμ κΉμ μ° μμ κ°λνμ§λ§ μ¬μ΄μ’μ μ€λμ΄μ κ·Έ νμ΄λ¨Έλ κ°μ‘±μ΄ μ΄κ³ μμλ€."
> kre.sub(r"([γ
-γ
£])([γ±-γ
]{1,2};)(.*?)([γ
-γ
£])([γ±-γ
]{1,2};)", r"\1\5\3\4\2", sun_and_moon, boundaries=True)
'μλ« μμ κΈ΄μ μ μμ κ°λνμ§λ§ μ¬μ΄μ‘΄μ μ€λμ΄μ κ·Έ νΉμ΄λ¨Έλ κ°μ‘Έμ΄ μκ³ μΌμλ€.'
FAQs
Subcharacter-level regular expression functionality for Korean
We found that kre demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Research
Security News
Eight new malicious Firefox extensions impersonate games, steal OAuth tokens, hijack sessions, and exploit browser permissions to spy on users.
Security News
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.