Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Python wrapper for the Yandex MyStem 3.1 morpholocial analyzer of the Russian language.
.. image:: https://travis-ci.org/nlpub/pymystem3.png?branch=master :target: http://travis-ci.org/nlpub/pymystem3 :alt: Build Status
This module contains a wrapper for an excellent morphological analyzer for Russian language Yandex Mystem 3.1 <https://tech.yandex.ru/mystem/>
_ released in June 2014.
A morphological analyzer can perform lemmatization of text and derive a set of morphological attributes for each token.
For more details about the algorithm see I. Segalovich «A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine» <http://download.yandex.ru/company/iseg-las-vegas.pdf>
_, MLMTA-2003, Las Vegas, Nevada, USA.
Python is the language of choice for many computational linguists, including those working with Russian language. The main motivation for this development was absence of any Python wrapper for the Mystem, a one of the most popular morphological analyzers for Russian language along with the PyMorphy2 <https://github.com/kmike/pymorphy2>
, the TreeTagger <http://corpus.leeds.ac.uk/mocky/>
and AOT <http://www.aot.ru/download.php>
_.
The third version of Mystem introduces several importaint improvements, most importaintly part-of-speech disambiguation. Our wrapper runs the Mystem in the mode which performs POS disambiguation.
This wrapper is open sources under MIT license. However, please consider that the Yandex Mystem is not open source and licensed under conditions of the Yandex License <http://legal.yandex.ru/mystem/>
_.
The wrapper works with CPython 2.6+/3.3+ and PyPy 1.9+.
The wrapper was tested on Ubuntu Linux 12.04+, Mac OSX 10.9+ and Windows 7+.
For 32bit architectures and freebsd platform support use ver. 0.1.10.
Stable version: https://pypi.python.org/pypi/pymystem3. You can install it using pip::
pip install pymystem3
.. * Documentation: http://pythonhosted.org/pymystem3
Latest version (recommended): https://github.com/nlpub/pymystem3::
pip install git+https://github.com/nlpub/pymystem3
Lemmatization
::
>>> from pymystem3 import Mystem
>>> text = "Красивая мама красиво мыла раму"
>>> m = Mystem()
>>> lemmas = m.lemmatize(text)
>>> print(''.join(lemmas))
красивый мама красиво мыть рама
Getting grammatical information and lemmas.
::
>>> import json
>>> from pymystem3 import Mystem
>>> text = "Красивая мама красиво мыла раму"
>>> m = Mystem()
>>> lemmas = m.lemmatize(text)
>>> print "lemmas:", ''.join(lemmas)
>>> print "full info:", json.dumps(m.analyze(text), ensure_ascii=False, encoding='utf8')
lemmas: красивый мама красиво мыть рама
full info: [{"text": "Красивая", "analysis": [{"lex": "красивый", "gr": "A=им,ед,полн,жен"}]}, {"text": " "}, {"text": "мама", "analysis": [{"lex": "мама", "gr": "S,жен,од=им,ед"}]}, {"text": " "}, {"text": "красиво", "analysis": [{"lex": "красиво", "gr": "ADV="}]}, {"text": " "}, {"text": "мыла", "analysis": [{"lex": "мыть", "gr": "V,несов,пе=прош,ед,изъяв,жен"}]}, {"text": " "}, {"text": "раму", "analysis": [{"lex": "рама", "gr": "S,жен,неод=вин,ед"}]}, {"text": "\n"}]
Please report any bugs or requests that you have using the GitHub issue tracker (https://github.com/nlpub/pymystem3/issues)! We have only very limited amount of resources to maintain this project: please propose a pull request directly if you see an obvious way of fixing the issue. We are very open to accepting bug fixes and your help is greatly appreciated.
The full list of contributors is listed by Github. You can also contact the original contributors of the project via email:
@ gmail
If you are interested in further developments or becoming a maintainter of this project please drop us an email: your help is greatly appreciated.
FAQs
Python wrapper for the Yandex MyStem 3.1 morpholocial analyzer of the Russian language.
We found that pymystem3 demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.