cutlet
Cutlet is a tool to convert Japanese to romaji. Check out the interactive demo! Also see the docs and the original blog post.
issueを英語で書く必要はありません。
Features:
- support for Modified Hepburn, Kunreisiki, Nihonsiki systems
- custom overrides for individual mappings
- custom overrides for specific words
- built in exceptions list (Tokyo, Osaka, etc.)
- uses foreign spelling when available in UniDic
- proper nouns are capitalized
- slug mode for url generation
Things not supported:
- traditional Hepburn n-to-m: Shimbashi
- macrons or circumflexes: Tōkyō, Tôkyô
- passport Hepburn: Satoh (but you can use an exception)
- hyphenating words
- Traditional Hepburn in general is not supported
Internally, cutlet uses fugashi, so you can
use the same dictionary you use for normal tokenization.
Installation
Cutlet can be installed through pip as usual.
pip install cutlet
Note that if you don't have a MeCab dictionary installed you'll also have to
install one. If you're just getting started
unidic-lite is a good choice.
pip install unidic-lite
Usage
A command-line script is included for quick testing. Just use cutlet
and each
line of stdin will be treated as a sentence. You can specify the system to use
(hepburn
, kunrei
, nippon
, or nihon
) as the first argument.
$ cutlet
ローマ字変換プログラム作ってみた。
Roma ji henkan program tsukutte mita.
In code:
import cutlet
katsu = cutlet.Cutlet()
katsu.romaji("カツカレーは美味しい")
katsu.slug("カツカレーは美味しい")
katsu.use_foreign_spelling = False
katsu.romaji("カツカレーは美味しい")
katu = cutlet.Cutlet('kunrei')
katu.romaji("富士山")
nkatu = cutlet.Cutlet('nihon')
sent = "彼女は王への手紙を読み上げた。"
katsu.romaji(sent)
katu.romaji(sent)
nkatu.romaji(sent)
Alternatives
- kakasi: Historically important, but not updated since 2014.
- pykakasi: self contained, it does segmentation on its own and uses its own dictionary.
- kuroshiro: Javascript based.
- kana: Go based.