Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More โ†’
Socket
Sign inDemoInstall
Socket

bareunpy

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

bareunpy

The bareun python library using grpc

  • 1.6.4
  • PyPI
  • Socket score

Maintainers
1

What is this?

bareunpy is the python 3 library for bareun.

Bareun is a Korean NLP, which provides tokenizing, POS tagging for Korean.

How to install

pip3 install bareunpy

How to get bareun

  • Go to https://bareun.ai/.
    • With registration, for the first time, you can get a API-KEY to use it freely.
    • With API-KEY, you can install the bareun1 server.
    • Or you can make a call to use this bareunpy library to any servers.
  • Or use docker image. See https://hub.docker.com/r/bareunai/bareun
docker pull bareunai/bareun:latest

How to use, tagger

import sys
import google.protobuf.text_format as tf
from bareunpy import Tagger

# You can get an API-KEY from https://bareun.ai/
# Please note that you need to sign up and verify your email.
# ์•„๋ž˜์— "https://bareun.ai/"์—์„œ ์ด๋ฉ”์ผ ์ธ์ฆ ํ›„ ๋ฐœ๊ธ‰๋ฐ›์€ API KEY("koba-...")๋ฅผ ์ž…๋ ฅํ•ด์ฃผ์„ธ์š”. "๋กœ๊ทธ์ธ-๋‚ด์ •๋ณด ํ™•์ธ"
API_KEY = "koba-ABCDEFG-1234567-LMNOPQR-7654321" # <- ๋ณธ์ธ์˜ API KEY๋กœ ๊ต์ฒด(Replace this with your own API KEY)

# If you have your own localhost bareun.
my_tagger = Tagger(API_KEY, 'localhost')
# or if you have your own bareun which is running on 10.8.3.211:15656.
my_tagger = Tagger(API_KEY, '10.8.3.211', 15656)


# print results. 
res = tagger.tags(["์•ˆ๋…•ํ•˜์„ธ์š”.", "๋ฐ˜๊ฐ€์›Œ์š”!"])

# get protobuf message.
m = res.msg()
tf.PrintMessage(m, out=sys.stdout, as_utf8=True)
print(tf.MessageToString(m, as_utf8=True))
print(f'length of sentences is {len(m.sentences)}')
## output : 2
print(f'length of tokens in sentences[0] is {len(m.sentences[0].tokens)}')
print(f'length of morphemes of first token in sentences[0] is {len(m.sentences[0].tokens[0].morphemes)}')
print(f'lemma of first token in sentences[0] is {m.sentences[0].tokens[0].lemma}')
print(f'first morph of first token in sentences[0] is {m.sentences[0].tokens[0].morphemes[0]}')
print(f'tag of first morph of first token in sentences[0] is {m.sentences[0].tokens[0].morphemes[0].tag}')

## Advanced usage.
for sent in m.sentences:
    for token in sent.tokens:
        for m in token.morphemes:
            print(f'{m.text.content}/{m.tag}:{m.probability}:{m.out_of_vocab})

# get json object
jo = res.as_json()
print(jo)

# get tuple of pos tagging.
pa = res.pos()
print(pa)
# another methods
ma = res.morphs()
print(ma)
na = res.nouns()
print(na)
va = res.verbs()
print(va)

# custom dictionary
cust_dic = tagger.custom_dict("my")
cust_dic.copy_np_set({'๋‚ด๊ณ ์œ ๋ช…์‚ฌ', '์šฐ๋ฆฌ์ง‘๊ณ ์œ ๋ช…์‚ฌ'})
cust_dic.copy_cp_set({'์ฝ”๋กœ๋‚˜19'})
cust_dic.copy_cp_caret_set({'์ฝ”๋กœ๋‚˜^๋ฐฑ์‹ ', '"๋…๊ฐ^๋ฐฑ์‹ '})
cust_dic.update()

# laod prev custom dict
cust_dict2 = tagger.custom_dict("my")
cust_dict2.load()

tagger.set_domain('my')
tagger.pos('์ฝ”๋กœ๋‚˜19๋Š” ์–ธ์ œ ๋๋‚ ๊นŒ์š”?')

How to use, tokenizer

import sys
import google.protobuf.text_format as tf
from bareunpy import Tokenizer

# You can get an API-KEY from https://bareun.ai/
# Please note that you need to sign up and verify your email.
# ์•„๋ž˜์— "https://bareun.ai/"์—์„œ ์ด๋ฉ”์ผ ์ธ์ฆ ํ›„ ๋ฐœ๊ธ‰๋ฐ›์€ API KEY("koba-...")๋ฅผ ์ž…๋ ฅํ•ด์ฃผ์„ธ์š”. "๋กœ๊ทธ์ธ-๋‚ด์ •๋ณด ํ™•์ธ"
API_KEY = "koba-ABCDEFG-1234567-LMNOPQR-7654321" # <- ๋ณธ์ธ์˜ API KEY๋กœ ๊ต์ฒด(Replace this with your own API KEY)

# If you have your own localhost bareun.
my_tokenizer = Tokenizer(API_KEY, 'localhost')
# or if you have your own bareun which is running on 10.8.3.211:15656.
my_tokenizer = Tagger(API_KEY, '10.8.3.211', 15656)


# print results. 
tokenized = tokenizer.tokenize_list(["์•ˆ๋…•ํ•˜์„ธ์š”.", "๋ฐ˜๊ฐ€์›Œ์š”!"])

# get protobuf message.
m = tokenized.msg()
tf.PrintMessage(m, out=sys.stdout, as_utf8=True)
print(tf.MessageToString(m, as_utf8=True))
print(f'length of sentences is {len(m.sentences)}')
## output : 2
print(f'length of tokens in sentences[0] is {len(m.sentences[0].tokens)}')
print(f'length of segments of first token in sentences[0] is {len(m.sentences[0].tokens[0].segments)}')
print(f'tagged of first token in sentences[0] is {m.sentences[0].tokens[0].tagged}')
print(f'first segment of first token in sentences[0] is {m.sentences[0].tokens[0].segments[0]}')
print(f'hint of first morph of first token in sentences[0] is {m.sentences[0].tokens[0].segments[0].hint}')

## Advanced usage.
for sent in m.sentences:
    for token in sent.tokens:
        for m in token.segments:
            print(f'{m.text.content}/{m.hint})

# get json object
jo = tokenized.as_json()
print(jo)

# get tuple of segments
ss = tokenized.segments()
print(ss)
ns = tokenized.nouns()
print(ns)
vs = tokenized.verbs()
print(vs)
# postpositions: ์กฐ์‚ฌ
ps = tokenized.postpositions()
print(ps)
# Adverbs, ๋ถ€์‚ฌ
ass = tokenized.adverbs()
print(ass)
ss = tokenized.symbols()
print(ss)

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with โšก๏ธ by Socket Inc