Security News
Bun 1.2 Released with 90% Node.js Compatibility and Built-in S3 Object Support
Bun 1.2 enhances its JavaScript runtime with 90% Node.js compatibility, built-in S3 and Postgres support, HTML Imports, and faster, cloud-first performance.
This is a Python wrapper for the MeCab-ko morphological analyzer for Korean text. It works with Python 3.6 and greater.
There are several implementations of python binding or wrapper for MeCab-ko, but they are generally not well maintained.
I made it to stand on the shoulders of giants(well-maintained open-source projects like MeCab, mecab-ko and mecab-python3) with minimum modifications.
I initially named it mecab-ko-python3
because the package name referenced for development was mecab-python3,
it may seem a little arrogant, but to reduce confusion in the PyPI, the name was changed to 'mecab-ko'.
(The repository is named 'pymecab-ko' to distinguish it from original mecab-ko)
Note: If using MacOS Big Sur, you'll need to upgrade pip to version 20.3 or higher to use wheels due to a pip issue.
issue를 영어로 작성할 필요는 없습니다.
Note that Windows wheels require a Microsoft Visual C++ Redistributable, so be sure to install that.
>>> import mecab_ko as MeCab
>>> tagger = MeCab.Tagger("-Owakati")
>>> tagger.parse("아버지가방에들어가신다").split()
['아버지', '가', '방', '에', '들어가', '신다']
>>> tagger = MeCab.Tagger()
>>> print(tagger.parse("아버지가방에들어가신다"))
아버지 NNG,*,F,아버지,*,*,*,*
가 JKS,*,F,가,*,*,*,*
방 NNG,*,T,방,*,*,*,*
에 JKB,*,F,에,*,*,*,*
들어가 VV,*,F,들어가,*,*,*,*
신다 EP+EC,*,F,신다,Inflect,EP,EC,시/EP/*+ㄴ다/EC/*
EOS
The API for pymecab-ko
closely follows the API for MeCab itself,
even when this makes it not very “Pythonic.” Please consult the official MeCab
documentation for more information.
Binary wheels are available for MacOS X, Linux, and Windows (64bit) are
installed by default when you use pip
:
pip install mecab-ko
These wheels include a copy of the MeCab-ko library and a dictionary.
There is a unique dictionary available for MeCab-ko. mecab-ko-dic
is automatically installed when installing pymacab-ko.
To build from source using pip,
pip install --no-binary :all: mecab-ko
In order to use MeCab-ko, you must install a dictionary. There are 2 dictionaries available for MeCab-ko.
These packages, which include slight modifications for ease of use, are recommended:
If you get a RuntimeError
when you try to run MeCab, here are some things to check:
You have to install this to use this package on Windows.
If you get this error:
error message: [ifs] no such file or directory: /usr/local/etc/mecabrc
You need to specify a mecabrc
file. It's OK to specify an empty file, it just
has to exist. You can specify a mecabrc
with -r
. This may be necessary on
Debian or Ubuntu, where the mecabrc
is in /etc/mecabrc
.
You can specify an empty mecabrc
like this:
tagger = MeCab.Tagger('-r/dev/null -d/home/hoge/mydic')
-Ochasen
Chasen output is not a built-in feature of MeCab, you must specify it in your
dicrc
or mecabrc
. Notably, mecab-ko-dic does not include Chasen output format.
Please see the MeCab documentation.
Like MeCab and mecab-python3, pymecab-ko
is copyrighted free software by
Taku Kudo taku@chasen.org and Nippon Telegraph and Telephone Corporation,
and is distributed under a 3-clause BSD license (see the file BSD
).
Alternatively, it may be redistributed under the terms of the
GNU General Public License, version 2 (see the file GPL
) or the
GNU Lesser General Public License, version 2.1 (see the file LGPL
).
FAQs
Python wrapper for the MeCab-ko morphological analyzer for Korean
We found that mecab-ko demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Bun 1.2 enhances its JavaScript runtime with 90% Node.js compatibility, built-in S3 and Postgres support, HTML Imports, and faster, cloud-first performance.
Security News
Biden's executive order pushes for AI-driven cybersecurity, software supply chain transparency, and stronger protections for federal and open source systems.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.