
Research
Security News
The Growing Risk of Malicious Browser Extensions
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
This is a package to install HanDic, a dictionary for morphological analysis of Korean languages, via pip and use it in Python.
To use this package for morphological analysis, the MeCab wrapper such as mecab-python3 is required.
[notice] After v.0.1.0, calendar versioning is used according to the dictionary version.
from PyPI:
pip install handic
Since HanDic requires Hangul Jamo(Unicode Hangul Jamo) as input, please convert Hangul (Unicode Hangul Syllables) using modules such as jamotools, or tools/k2jamo.py
script included in HanDic.
example:
import MeCab
import handic
import jamotools
mecaboption = f'-r /dev/null -d {handic.DICDIR}'
tokenizer = MeCab.Tagger(mecaboption)
tokenizer.parse('')
# 《표준국어대사전》 "형태소" 뜻풀이
sentence = u'뜻을 가진 가장 작은 말의 단위. ‘이야기책’의 ‘이야기’, ‘책’ 따위이다.'
jamo = jamotools.split_syllables(sentence, jamo_type="JAMO")
node = tokenizer.parseToNode(jamo)
while node:
print(node.surface, node.feature)
node = node.next
result:
BOS/EOS,*,*,*,*,*,*,*,*,*,*
뜻 Noun,普通,*,*,*,뜻,뜻,*,*,B,NNG
을 Ending,助詞,対格,*,*,을02,을,*,*,*,JKO
가지 Verb,自立,*,語基2,*,가지다,가지,*,*,A,VV
ᆫ Ending,語尾,連体形,*,2接続,ㄴ05,ㄴ,*,*,*,ETM
가장 Adverb,一般,*,*,*,가장01,가장,*,*,A,MAG
작으 Adjective,自立,*,語基2,*,작다01,작으,*,*,A,VA
ᆫ Ending,語尾,連体形,*,2接続,ㄴ05,ㄴ,*,*,*,ETM
말 Noun,普通,動作,*,*,말01,말,*,*,A,NNG
의 Ending,助詞,属格,*,*,의10,의,*,*,*,JKG
단위 Noun,普通,*,*,*,단위02,단위,單位,*,C,NNG
. Symbol,ピリオド,*,*,*,.,.,*,*,*,SF
‘ Symbol,カッコ,引用符-始,*,*,‘,‘,*,*,*,SS
이야기책 Noun,普通,*,*,*,이야기책,이야기책,이야기冊,*,*,NNG
’ Symbol,カッコ,引用符-終,*,*,’,’,*,*,*,SS
의 Ending,助詞,属格,*,*,의10,의,*,*,*,JKG
‘ Symbol,カッコ,引用符-始,*,*,‘,‘,*,*,*,SS
이야기 Noun,普通,動作,*,*,이야기,이야기,*,*,A,NNG
’ Symbol,カッコ,引用符-終,*,*,’,’,*,*,*,SS
, Symbol,コンマ,*,*,*,",",",",*,*,*,SP
‘ Symbol,カッコ,引用符-始,*,*,‘,‘,*,*,*,SS
책 Noun,普通,*,*,*,책01,책,冊,*,A,NNG
’ Symbol,カッコ,引用符-終,*,*,’,’,*,*,*,SS
따위 Noun,依存名詞,*,*,*,따위,따위,*,*,*,NNB
이 Siteisi,非自立,*,語基1,*,이다,이,*,*,*,VCP
다 Ending,語尾,終止形,*,1接続,다06,다,*,*,*,EF
. Symbol,ピリオド,*,*,*,.,.,*,*,*,SF
BOS/EOS,*,*,*,*,*,*,*,*,*,*
example:
mecaboption = f'-r /dev/null -d {handic.DICDIR} -Otokenize'
tokenizer = MeCab.Tagger(mecaboption)
print(tokenizer.parse(jamo))
result:
뜻 을 가지 ㄴ 가장 작으 ㄴ 말 의 단위 . ‘ 이야기책 ’ 의 ‘ 이야기 ’ , ‘ 책 ’ 따위 이 다 .
example:
mecaboption = f'-r /dev/null -d {handic.DICDIR}'
tokenizer = MeCab.Tagger(mecaboption)
tokenizer.parse('')
node = tokenizer.parseToNode(jamo)
while node:
# 일반명사(pos-tag: NNG)만 추출
if node.feature.split(',')[10] in ['NNG']:
print(node.feature.split(',')[5])
node = node.next
result:
뜻
말01
단위02
이야기책
이야기
책01
Here is the list of features included in HanDic. For more information, see the HanDic 품사 정보.
語基1
, 語基2
, 語基3
)(index: 3)1接続
, 2接続
, etc.)(index: 4)This code is licensed under the MIT license. HanDic is copyright Yoshinori Sugai and distributed under the BSD license.
This repository is forked from unidic-lite with some modifications and file additions and deletions.
FAQs
HanDic package for installing via pip.
We found that handic demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Research
Security News
An in-depth analysis of credential stealers, crypto drainers, cryptojackers, and clipboard hijackers abusing open source package registries to compromise Web3 development environments.
Security News
pnpm 10.12.1 introduces a global virtual store for faster installs and new options for managing dependencies with version catalogs.