
Product
Announcing Socket Fix 2.0
Socket Fix 2.0 brings targeted CVE remediation, smarter upgrade planning, and broader ecosystem support to help developers get to zero alerts.
Japanese REAMED is here. (日本語のREADMEはこちらです。)
https://github.com/sea-turt1e/kanjiconv/blob/main/README_ja.md
Kanji Converter to Hiragana, Katakana, Roman alphabet.
You can get the reading and pronunciation of Japanese sentences based on sudachidict.
Sudachidict is a regularly updated dictionary, so it can relatively handle new proper nouns and other terms.
python>=3.11.7
pip install kanjiconv
If you want to use the UniDic dictionary with the use_unidic option, please download the unidic dictionary.
python -m unidic download
from kanjiconv import KanjiConv
# Basic usage
kanji_conv = KanjiConv(separator="/")
# Using UniDic for improved kanji reading accuracy
kanji_conv = KanjiConv(separator="/", use_unidic=True)
# Using custom dictionary for kanji readings not covered by SudachiDict or UniDic
kanji_conv = KanjiConv(separator="/", use_custom_readings=True)
# convert to hiragana
text = "幽☆遊☆白書は、最高の漫画デス。"
print(kanji_conv.to_hiragana(text))
ゆうゆうはくしょ/は/、/さいこう/の/まんが/です/。
# convert to katakana
text = "幽☆遊☆白書は、最高の漫画デス。"
print(kanji_conv.to_katakana(text))
ユウユウハクショ/ハ/、/サイコウ/ノ/マンガ/デス/。
# convert to Roman alphabet
text = "幽☆遊☆白書は、最高の漫画デス。"
print(kanji_conv.to_roman(text))
yuuyuuhakusho/ha/, /saikou/no/manga/desu/.
# You can change separator to another character or None
kanji_conv = KanjiConv(separator="_")
print(kanji_conv.to_hiragana(text))
ゆうゆうはくしょ_は_、_さいこう_の_まんが_です_。
kanji_conv = KanjiConv(separator="")
print(kanji_conv.to_hiragana(text))
ゆうゆうはくしょは、さいこうのまんがです。
KanjiConv supports a custom dictionary for handling special kanji readings that are not properly recognized by SudachiDict or UniDic. This is particularly useful for:
The custom dictionary is automatically loaded from the package if available, but you can also define your own:
from kanjiconv import KanjiConv
# Create instance with custom readings enabled (enabled by default)
kanji_conv = KanjiConv(separator="/", use_custom_readings=True)
# You can also define your own custom readings
kanji_conv.custom_readings = {
"single": {
"激": ["げき"],
"飛": ["と", "ひ"]
},
"compound": {
"激を飛ばす": "げきをとばす",
"飛ばす": "とばす"
}
}
# Now the special expression will be properly converted
print(kanji_conv.to_hiragana("激を飛ばす"))
# Output: げき/を/とばす
The custom dictionary uses the following format:
single
: A dictionary mapping individual kanji to their reading(s)
compound
: A dictionary mapping multi-character expressions to their reading
The default dictionary is sudachidict_full. If you want to use a lighter dictionary, you can install either sudachidict_small or sudachidict_core.
pip install sudachidict_small
pip install sudachidict_core
kanji_conv = KanjiConv(sudachi_dict_type="small", separator="/")
kanji_conv = KanjiConv(sudachi_dict_type="core", separator="/")
kanjiconv reading function is based on SudachiDict, and you need to update SudachiDict regularly via pip.
pip install -U sudachidict_full
pip install -U sudachidict_small
pip install -U sudachidict_core
This project is licensed under the Apache License 2.0.
This library uses SudachiPy and its dictionary SudachiDict for morphological analysis. These are also distributed under the Apache License 2.0.
For detailed license information, please refer to the LICENSE files of each project:
FAQs
Kanji Converter to Hiragana, Katakana, Roman alphabet
We found that kanjiconv demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket Fix 2.0 brings targeted CVE remediation, smarter upgrade planning, and broader ecosystem support to help developers get to zero alerts.
Security News
Socket CEO Feross Aboukhadijeh joins Risky Business Weekly to unpack recent npm phishing attacks, their limited impact, and the risks if attackers get smarter.
Product
Socket’s new Tier 1 Reachability filters out up to 80% of irrelevant CVEs, so security teams can focus on the vulnerabilities that matter.