
Security News
Meet Socket at Black Hat and DEF CON 2025 in Las Vegas
Meet Socket at Black Hat & DEF CON 2025 for 1:1s, insider security talks at Allegiant Stadium, and a private dinner with top minds in software supply chain security.
Polyglot is a natural language pipeline that supports massive multilingual applications.
|Downloads| |Latest Version| |Build Status| |Documentation Status|
.. |Downloads| image:: https://img.shields.io/pypi/dm/polyglot.svg :target: https://pypi.python.org/pypi/polyglot .. |Latest Version| image:: https://badge.fury.io/py/polyglot.svg :target: https://pypi.python.org/pypi/polyglot .. |Build Status| image:: https://travis-ci.org/aboSamoor/polyglot.png?branch=master :target: https://travis-ci.org/aboSamoor/polyglot .. |Documentation Status| image:: https://readthedocs.org/projects/polyglot/badge/?version=latest :target: https://readthedocs.org/builds/polyglot/
Polyglot is a natural language pipeline that supports massive multilingual applications.
Features
- Tokenization (165 Languages)
- Language detection (196 Languages)
- Named Entity Recognition (40 Languages)
- Part of Speech Tagging (16 Languages)
- Sentiment Analysis (136 Languages)
- Word Embeddings (137 Languages)
- Morphological analysis (135 Languages)
- Transliteration (69 Languages)
Developer
rmyeid gmail com
.. code:: python
import polyglot
from polyglot.text import Text, Word
Language Detection
.. code:: python
text = Text("Bonjour, Mesdames.")
print("Language Detected: Code={}, Name={}\n".format(text.language.code, text.language.name))
.. parsed-literal::
Language Detected: Code=fr, Name=French
Tokenization
~~~~~~~~~~~~
.. code:: python
zen = Text("Beautiful is better than ugly. "
"Explicit is better than implicit. "
"Simple is better than complex.")
print(zen.words)
.. parsed-literal::
[u'Beautiful', u'is', u'better', u'than', u'ugly', u'.', u'Explicit', u'is', u'better', u'than', u'implicit', u'.', u'Simple', u'is', u'better', u'than', u'complex', u'.']
.. code:: python
print(zen.sentences)
.. parsed-literal::
[Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple is better than complex.")]
Part of Speech Tagging
.. code:: python
text = Text(u"O primeiro uso de desobediência civil em massa ocorreu em setembro de 1906.")
print("{:<16}{}".format("Word", "POS Tag")+"\n"+"-"*30)
for word, tag in text.pos_tags:
print(u"{:<16}{:>2}".format(word, tag))
.. parsed-literal::
Word POS Tag
------------------------------
O DET
primeiro ADJ
uso NOUN
de ADP
desobediência NOUN
civil ADJ
em ADP
massa NOUN
ocorreu ADJ
em ADP
setembro NOUN
de ADP
1906 NUM
. PUNCT
Named Entity Recognition
.. code:: python
text = Text(u"In Großbritannien war Gandhi mit dem westlichen Lebensstil vertraut geworden")
print(text.entities)
.. parsed-literal::
[I-LOC([u'Gro\\xdfbritannien']), I-PER([u'Gandhi'])]
Polarity
~~~~~~~~
.. code:: python
print("{:<16}{}".format("Word", "Polarity")+"\n"+"-"*30)
for w in zen.words[:6]:
print("{:<16}{:>2}".format(w, w.polarity))
.. parsed-literal::
Word Polarity
------------------------------
Beautiful 0
is 0
better 1
than 0
ugly -1
. 0
Embeddings
~~~~~~~~~~
.. code:: python
word = Word("Obama", language="en")
print("Neighbors (Synonms) of {}".format(word)+"\n"+"-"*30)
for w in word.neighbors:
print("{:<16}".format(w))
print("\n\nThe first 10 dimensions out the {} dimensions\n".format(word.vector.shape[0]))
print(word.vector[:10])
.. parsed-literal::
Neighbors (Synonms) of Obama
------------------------------
Bush
Reagan
Clinton
Ahmadinejad
Nixon
Karzai
McCain
Biden
Huckabee
Lula
The first 10 dimensions out the 256 dimensions
[-2.57382345 1.52175975 0.51070285 1.08678675 -0.74386948 -1.18616164
2.92784619 -0.25694436 -1.40958667 -2.39675403]
Morphology
~~~~~~~~~~
.. code:: python
word = Text("Preprocessing is an essential step.").words[0]
print(word.morphemes)
.. parsed-literal::
[u'Pre', u'process', u'ing']
Transliteration
~~~~~~~~~~~~~~~
.. code:: python
from polyglot.transliteration import Transliterator
transliterator = Transliterator(source_lang="en", target_lang="ru")
print(transliterator.transliterate(u"preprocessing"))
.. parsed-literal::
препрокессинг
History
-------
"14.11" (2014-01-11)
---------------------
* First release on PyPI.
"15.5.2" (2015-05-02)
---------------------
* Polyglot is feature complete.
"15.10.03" (2015-10-03)
---------------------------
* Change the polyglot models mirror to Stony Brook University DSL lab instead
of Google cloud storage.
"16.07.04" (2016-07-03)
---------------------------
* New Features:
- Support Transfer POS Tagging.
- Support supplying `hint_language_code` for `Text`.
* Bug Fix:
- Improve sentence serialization (PR #34)
- Fix rare unicode encode error (PR #35)
- Fix transliteration from languages other than English (PR 46)
- Add link to Github in README (PR #49)
- Make handling of paths more coherent (RP #55)
- Fix normalizing embedding in place for NER corrupts the features of POS (issue #60, PR #62)
FAQs
Polyglot is a natural language pipeline that supports massive multilingual applications.
We found that polyglot demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Meet Socket at Black Hat & DEF CON 2025 for 1:1s, insider security talks at Allegiant Stadium, and a private dinner with top minds in software supply chain security.
Security News
CAI is a new open source AI framework that automates penetration testing tasks like scanning and exploitation up to 3,600× faster than humans.
Security News
Deno 2.4 brings back bundling, improves dependency updates and telemetry, and makes the runtime more practical for real-world JavaScript projects.