Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

bgnlp

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

bgnlp

Package for Bulgarian Natural Language Processing (NLP)

  • 0.5.3
  • PyPI
  • Socket score

Maintainers
1

bgnlp: Model-first approach to Bulgarian NLP

Open In Colab

Downloads

pip install bgnlp

Package functionalities

Please note - only the first time you run one of these operations a model will be downloaded! Therefore, the first run might take more time.

Part-of-speech (PoS) tagging

from bgnlp import pos


print(pos("Това е библиотека за обработка на естествен език."))
[{
    "word": "Това",
    "tag": "PDOsn",
    "bg_desc": "местоимение",
    "en_desc": "pronoun"
}, {
    "word": "е",
    "tag": "VLINr3s",
    "bg_desc": "глагол",
    "en_desc": "verb"
}, {
    "word": "библиотека",
    "tag": "NCFsof",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": "за",
    "tag": "R",
    "bg_desc": "предлог",
    "en_desc": "preposition"
}, {
    "word": "обработка",
    "tag": "NCFsof",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": "на",
    "tag": "R",
    "bg_desc": "предлог",
    "en_desc": "preposition"
}, {
    "word": "естествен",
    "tag": "Asmo",
    "bg_desc": "прилагателно име",
    "en_desc": "adjective"
}, {
    "word": "език",
    "tag": "NCMsom",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": ".",
    "tag": "U",
    "bg_desc": "препинателен знак",
    "en_desc": "punctuation"
}]

Lemmatization

from bgnlp import lemmatize


text = "Добре дошли!"
print(lemmatize(text))
[{'word': 'Добре', 'lemma': 'Добре'}, {'word': 'дошли', 'lemma': 'дойда'}, {'word': '!', 'lemma': '!'}]
# Generating a string of lemmas.
print(lemmatize(text, as_string=True))
Добре дойда!

Named Entity Recognition (NER) tagging

Currently, the available NER tags are:

  • PER - Person
  • ORG - Organization
  • LOC - Location
from bgnlp import ner


text = "Барух Спиноза е роден в Амстердам"

print(f"Input: {text}")
print("Result:", ner(text))
Input: Барух Спиноза е роден в Амстердам
Result: [{'word': 'Барух Спиноза', 'entity_group': 'PER'}, {'word': 'Амстердам', 'entity_group': 'LOC'}]

Keyword Extraction

from bgnlp import extract_keywords


# Reading the text from a file, since it may be large, hence it wouldn't be 
# pleasant to write it directly here.
# The current input is this Bulgarian news article (only the text, no HTML!):
# https://novini.bg/sviat/eu/781622
with open("input_text.txt", "r", encoding="utf-8") as f:
    text = f.read()

# Extracting keywords with probability of at least 0.5.
keywords = extract_keywords(text, threshold=0.5)
print("Keywords:")
pprint(keywords)
Keywords:
[{'keyword': 'Еманюел Макрон', 'score': 0.8759163320064545},
 {'keyword': 'Г-7', 'score': 0.5938143730163574},
 {'keyword': 'Япония', 'score': 0.607077419757843}]

Commatization

from pprint import pprint

from bgnlp import commatize


text = "Човекът искащ безгрижно писане ме помоли да създам този модел."

print("Without metadata:")
print(commatize(text))

print("\nWith metadata:")
pprint(commatize(text, return_metadata=True))
Without metadata:
Човекът, искащ безгрижно писане, ме помоли да създам този модел.

With metadata:
('Човекът, искащ безгрижно писане, ме помоли да създам този модел.',
 [{'end': 12,
   'score': 0.9301406145095825,
   'start': 0,
   'substring': 'Човекът, иск'},
  {'end': 34,
   'score': 0.93571537733078,
   'start': 24,
   'substring': ' писане, м'}])

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc