Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
pyopenjtalk-plus は、各フォークでの改善を一つのコードベースにまとめ、さらなる改善を加えることを目的とした、pyopenjtalk の派生ライブラリです。
pyopenjtalk-plus
に変更
pyopenjtalk
から変更されておらず、pyopenjtalk 本家同様に import pyopenjtalk
でインポートできるpyopenjtalk.run_frontend()
関数に CLI インターフェイスを追加
python -m pyopenjtalk "あらゆる現実を、すべて自分の方へねじ曲げたのだ。"
_lazy_init()
関数内での辞書ダウンロード処理は pyopenjtalk-plus での辞書同梱に伴い削除している
task build-dictionary
を実行するとビルドできる
pyopenjtalk.run_frontend()
や pyopenjtalk.g2p()
でも run_marine=True
を指定し marine による AI アクセント推定を行えるようにした
pyopenjtalk.extract_fullcontext()
では marine による AI アクセント推定が可能だったが、pyopenjtalk.run_frontend()
や pyopenjtalk.g2p()
にも実装したpip install marine-plus
で marine 本家の代わりに marine-plus をインストールできるpyopenjtalk.unset_user_dict()
関数を追加
set_user_dict()
関数が update_global_jtalk_with_user_dict()
関数になるなど、同等の機能ながら関数名は変更されているpyopenjtalk.unset_user_dict()
関数が移植されており、pyopenjtalk-plus でもこの実装を継承したMecab_load_with_userdic()
関数を追加mecab-dict-index
モジュールにログ出力を抑制する --quiet
オプションを追加mecab-dict-index
モジュールの main()
関数 (元は CLI コマンド用) をコメントアウト
mecab-dict-index
の argv と argc に相当する値を、ライブラリ側から OpenJTalk の mecab_dict_index()
関数 (mecab-dict-index
コマンドのエントリーポイント) の引数として注入する」という非常にトリッキーかつ強引な手法で実装されているmecab-dict-index
コマンドをビルドする必要がないlist[NJDFeature]
内の値を補正している点がユニークtaskipy
によるタスクランナーでの管理に変更下記コマンドを実行して、ライブラリをインストールできます。
pip install pyopenjtalk-plus
開発環境は macOS / Linux 、Python バージョンは 3.11 が前提です。
# submodule ごとリポジトリを clone
git clone --recursive https://github.com/tsukumijima/pyopenjtalk-plus.git
cd pyopenjtalk-plus
# ライブラリ自身とその依存関係を .venv/ 以下の仮想環境にインストールし、開発環境を構築
pip install taskipy
task install
# コード整形
task lint
task format
# テストの実行
task test
# pyopenjtalk/dictionary/ 以下にある MeCab / OpenJTalk 辞書をビルド
## ビルド成果物は同ディレクトリに *.bin / *.dic として出力される
## ビルド後の辞書データは数百 MB あるバイナリファイルだが、取り回しやすいよう敢えて Git 管理下に含めている
task build-dictionary
# ライブラリの wheel と sdist をビルドし、dist/ に出力
task build
# ビルド成果物をクリーンアップ
task clean
下記ならびに docs/ 以下のドキュメントは、pyopenjtalk 本家のドキュメントを改変なしでそのまま引き継いでいます。
これらのドキュメントの内容が pyopenjtalk-plus にも通用するかは保証されません。
A python wrapper for OpenJTalk.
The package consists of two core components:
Before using the pyopenjtalk package, please have a look at the LICENSE for the two software.
The python package relies on cython to make python bindings for open_jtalk and hts_engine_API. You must need the following tools to build and install pyopenjtalk:
pip install pyopenjtalk
To build the package locally, you will need to make sure to clone open_jtalk and hts_engine_API.
git submodule update --recursive --init
and then run
pip install -e .
Please check the notebook version here (nbviewer).
In [1]: import pyopenjtalk
In [2]: from scipy.io import wavfile
In [3]: x, sr = pyopenjtalk.tts("おめでとうございます")
In [4]: wavfile.write("test.wav", sr, x.astype(np.int16))
In [1]: import pyopenjtalk
In [2]: pyopenjtalk.extract_fullcontext("こんにちは")
Out[2]:
['xx^xx-sil+k=o/A:xx+xx+xx/B:xx-xx_xx/C:xx_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:xx_xx#xx_xx@xx_xx|xx_xx/G:5_5%0_xx_xx/H:xx_xx/I:xx-xx@xx+xx&xx-xx|xx+xx/J:1_5/K:1+1-5',
'xx^sil-k+o=N/A:-4+1+5/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',
'sil^k-o+N=n/A:-4+1+5/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',
'k^o-N+n=i/A:-3+2+4/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',
'o^N-n+i=ch/A:-2+3+3/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',
'N^n-i+ch=i/A:-2+3+3/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',
'n^i-ch+i=w/A:-1+4+2/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',
'i^ch-i+w=a/A:-1+4+2/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',
'ch^i-w+a=sil/A:0+5+1/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',
'i^w-a+sil=xx/A:0+5+1/B:xx-xx_xx/C:09_xx+xx/D:xx+xx_xx/E:xx_xx!xx_xx-xx/F:5_5#0_xx@1_1|1_5/G:xx_xx%xx_xx_xx/H:xx_xx/I:1-5@1+1&1-1|1+5/J:xx_xx/K:1+1-5',
'w^a-sil+xx=xx/A:xx+xx+xx/B:xx-xx_xx/C:xx_xx+xx/D:xx+xx_xx/E:5_5!0_xx-xx/F:xx_xx#xx_xx@xx_xx|xx_xx/G:xx_xx%xx_xx_xx/H:1_5/I:xx-xx@xx+xx&xx-xx|xx+xx/J:xx_xx/K:1+1-5']
Please check lab_format.pdf
in HTS-demo_NIT-ATR503-M001.tar.bz2 for more details about full-context labels.
In [1]: import pyopenjtalk
In [2]: pyopenjtalk.g2p("こんにちは")
Out[2]: 'k o N n i ch i w a'
In [3]: pyopenjtalk.g2p("こんにちは", kana=True)
Out[3]: 'コンニチワ'
user.csv
) and write custom words like below:GNU,,,1,名詞,一般,*,*,*,*,GNU,グヌー,グヌー,2/3,*
mecab_dict_index
to compile the CSV file.In [1]: import pyopenjtalk
In [2]: pyopenjtalk.mecab_dict_index("user.csv", "user.dic")
reading user.csv ... 1
emitting double-array: 100% |###########################################|
done!
update_global_jtalk_with_user_dict
to apply the user dictionary.In [3]: pyopenjtalk.g2p("GNU")
Out[3]: 'j i i e n u y u u'
In [4]: pyopenjtalk.update_global_jtalk_with_user_dict("user.dic")
In [5]: pyopenjtalk.g2p("GNU")
Out[5]: 'g u n u u'
run_marine
optionAfter v0.3.0, the run_marine
option has been available for estimating the Japanese accent with the DNN-based method (see marine). If you want to use the feature, please install pyopenjtalk as below;
pip install pyopenjtalk[marine]
And then, you can use the option as the following examples;
In [1]: import pyopenjtalk
In [2]: x, sr = pyopenjtalk.tts("おめでとうございます", run_marine=True) # for TTS
In [3]: label = pyopenjtalk.extract_fullcontext("こんにちは", run_marine=True) # for text processing frontend only
HTS Working Group for their dedicated efforts to develop and maintain Open JTalk.
FAQs
A Python wrapper for OpenJTalk with additional improvements
We found that pyopenjtalk-plus demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.