New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

textblob-de

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

textblob-de

German language support for TextBlob.

0.4.3
PyPI

Maintainers: 1

================== textblob-de README

.. image:: https://img.shields.io/pypi/v/textblob-de.svg :target: https://pypi.python.org/pypi/textblob-de/ :alt: textblob_de - latest PyPI version

.. image:: https://travis-ci.org/markuskiller/textblob-de.png?branch=dev :target: https://travis-ci.org/markuskiller/textblob-de :alt: Travis-CI

.. image:: https://readthedocs.org/projects/textblob-de/badge/?version=latest :target: http://textblob-de.readthedocs.org/en/latest/ :alt: Documentation Status

.. image:: https://img.shields.io/pypi/dm/textblob-de.svg :target: https://pypi.python.org/pypi/textblob-de/ :alt: Number of PyPI downloads

.. image:: https://img.shields.io/github/license/markuskiller/textblob-de.svg :target: http://choosealicense.com/licenses/mit/ :alt: LICENSE info

German language support for TextBlob <http://textblob.readthedocs.org/en/dev/>_ by Steven Loria.

This python package is being developed as a TextBlob Language Extension. See Extension Guidelines <https://textblob.readthedocs.org/en/dev/contributing.html>_ for details.

Features

NEW: Works with Python3.7
All directly accessible textblob_de classes (e.g. Sentence() or Word()) are initialized with default models for German
Properties or methods that do not yet work for German raise a NotImplementedError
German sentence boundary detection and tokenization (NLTKPunktTokenizer)
Consistent use of specified tokenizer for all tools (NLTKPunktTokenizer or PatternTokenizer)
Part-of-speech tagging (PatternTagger) with keyword include_punc=True (defaults to False)
Tagset conversion in PatternTagger with keyword tagset='penn'|'universal'|'stts' (defaults to penn)
Parsing (PatternParser) with all pattern keywords, plus pprint=True (defaults to False)
Noun Phrase Extraction (PatternParserNPExtractor)
Lemmatization (PatternParserLemmatizer)
Polarity detection (PatternAnalyzer) - Still EXPERIMENTAL, does not yet have information on subjectivity
Full pattern.text.de API support on Python3
Supports Python 2 and 3
See working features overview <http://langui.ch/nlp/python/textblob-de-dev/>_ for details

Installing/Upgrading

$ pip install -U textblob-de
$ python -m textblob.download_corpora

Or the latest development release (apparently this does not always work on Windows see issues #1744/5 <https://github.com/pypa/pip/pull/1745>_ for details)::

$ pip install -U git+https://github.com/markuskiller/textblob-de.git@dev
$ python -m textblob.download_corpora

.. note::

TextBlob will be installed/upgraded automatically when running pip install. The second line (python -m textblob.download_corpora) downloads/updates nltk corpora and language models used in TextBlob.

Usage

.. code-block:: python

>>> from textblob_de import TextBlobDE as TextBlob
>>> text = '''Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag. 
Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen. Aber leider 
habe ich nur noch EUR 3.50 in meiner Brieftasche.'''
>>> blob = TextBlob(text)
>>> blob.sentences
[Sentence("Heute ist der 3. Mai 2014 und Dr. Meier feiert seinen 43. Geburtstag."),
 Sentence("Ich muss unbedingt daran denken, Mehl, usw. für einen Kuchen einzukaufen."),
 Sentence("Aber leider habe ich nur noch EUR 3.50 in meiner Brieftasche.")]
>>> blob.tokens
WordList(['Heute', 'ist', 'der', '3.', 'Mai', ...]
>>> blob.tags
[('Heute', 'RB'), ('ist', 'VB'), ('der', 'DT'), ('3.', 'LS'), ('Mai', 'NN'), 
('2014', 'CD'), ...]
# Default: Only noun_phrases that consist of two or more meaningful parts are displayed.
# Not perfect, but a start (relies heavily on parser accuracy)
>>> blob.noun_phrases
WordList(['Mai 2014', 'Dr. Meier', 'seinen 43. Geburtstag', 'Kuchen einzukaufen', 
'meiner Brieftasche'])

.. code-block:: python

>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.parse()
'Das/DT/B-NP/O Auto/NN/I-NP/O ist/VB/B-VP/O sehr/RB/B-ADJP/O schön/JJ/I-ADJP/O'
>>> from textblob_de import PatternParser
>>> blob = TextBlobDE("Das ist ein schönes Auto.", parser=PatternParser(pprint=True, lemmata=True))
>>> blob.parse()
      WORD   TAG    CHUNK   ROLE   ID     PNP    LEMMA   

       Das   DT     -       -      -      -      das     
       ist   VB     VP      -      -      -      sein    
       ein   DT     NP      -      -      -      ein     
   schönes   JJ     NP ^    -      -      -      schön   
      Auto   NN     NP ^    -      -      -      auto    
         .   .      -       -      -      -      .       
>>> from textblob_de import PatternTagger
>>> blob = TextBlob(text, pos_tagger=PatternTagger(include_punc=True))
[('Das', 'DT'), ('Auto', 'NN'), ('ist', 'VB'), ('sehr', 'RB'), ('schön', 'JJ'), ('.', '.')]

.. code-block:: python

>>> blob = TextBlob("Das Auto ist sehr schön.")
>>> blob.sentiment
Sentiment(polarity=1.0, subjectivity=0.0)
>>> blob = TextBlob("Das ist ein hässliches Auto.")     
>>> blob.sentiment
Sentiment(polarity=-1.0, subjectivity=0.0)

.. warning::

**WORK IN PROGRESS:** The German polarity lexicon contains only uninflected
forms and there are no subjectivity scores yet. As of version 0.2.3, lemmatized
word forms are submitted to the ``PatternAnalyzer``, increasing the accuracy
of polarity values. New in version 0.2.7: return type of ``.sentiment`` is now
adapted to the main `TextBlob <http://textblob.readthedocs.org/en/dev/>`_ library (``:rtype: namedtuple``).

.. code-block:: python

>>> blob.words.lemmatize()
WordList(['das', 'sein', 'ein', 'hässlich', 'Auto'])
>>> from textblob_de.lemmatizers import PatternParserLemmatizer
>>> _lemmatizer = PatternParserLemmatizer()
>>> _lemmatizer.lemmatize("Das ist ein hässliches Auto.")
[('das', 'DT'), ('sein', 'VB'), ('ein', 'DT'), ('hässlich', 'JJ'), ('Auto', 'NN')]

.. note::

Make sure that you use unicode strings on Python2 if your input contains
non-ascii characters (e.g. ``word = u"schön"``).

Access to `pattern` API in Python3

.. code-block:: python

>>> from textblob_de.packages import pattern_de as pd
>>> print(pd.attributive("neugierig", gender=pd.FEMALE, role=pd.INDIRECT, article="die"))
neugierigen

.. note::

Alternatively, the path to textblob_de/ext can be added to the PYTHONPATH, which allows the use of pattern.de in almost the same way as described in its Documentation <http://www.clips.ua.ac.be/pages/pattern-de>_. The only difference is that you will have to prepend an underscore: from _pattern.de import .... This is a precautionary measure in case the pattern library gets native Python3 support in the future.

Documentation and API Reference

http://textblob-de.readthedocs.org/en/latest

Requirements

Python >= 2.6 or >= 3.3

TODO

Planned Extensions <http://textblob-de.readthedocs.org/en/latest/extensions.html>_
Additional PoS tagging options, e.g. NLTK tagging (NLTKTagger)
Improve noun phrase extraction (e.g. based on RFTagger output)
Improve sentiment analysis (find suitable subjectivity scores)
Improve functionality of Sentence() and Word() objects
Adapt more tests from the main TextBlob <http://textblob.readthedocs.org/en/dev/>_ library (esp. for TextBlobDE() in test_blob.py)

License

MIT licensed. See the bundled LICENSE <https://github.com/markuskiller/textblob-de/blob/master/LICENSE>_ file for more details.

Thanks

Coded with Wing IDE (free open source developer license)

.. image:: https://wingware.com/images/wingware-logo-180x58.png :target: https://wingware.com/store/free :alt: Python IDE for Python - wingware.com

Changelog

0.4.3 (03/01/2019) ++++++++++++++++++

Added support for Python3.7 (StopIteration --> return) Pull Request #18 <https://github.com/markuskiller/textblob-de/pull/18>_ (thanks @andrewmfiorillo)
Fixed tests for Google translation examples
Updated tox/Travis-CI config files to include latest Python & pypy versions
Updated sphinx_rtd_theme to version 0.4.2 to fix rendering problems on RTD <http://textblob-de.readthedocs.org>_
Updated setup.py publish commands, Makefile & Manifest.in to new PiPy (using twine)

0.4.2 (02/05/2015) ++++++++++++++++++

Removed dependency on NLTK, <https://github.com/nltk/nltk/>_ as it already is a TextBlob <http://textblob.readthedocs.org/en/dev/>_ dependency
Temporary workaround for NLTK Issue #824 <https://github.com/nltk/nltk/issues/824>_ for tox/Travis-CI
(update 13/01/2015) NLTK Issue #824 <https://github.com/nltk/nltk/issues/824>_ fixed, workaround removed
Enabled pattern tagset conversion ('penn'|'universal'|'stts') for PatternTagger
Added tests for tagset conversion
Fixed test for Arabic translation example (Google translation has changed)
Added tests for lemmatizer
Bugfix: PatternAnalyzer no longer breaks on subsequent ocurrences of the same (word, tag) pairs on Python3 see comments to Pull Request #11 <https://github.com/markuskiller/textblob-de/pull/11>_
Bugfix/performance enhancement: Sentiment dictionary in PatternAnalyzer no longer reloaded for every sentence Pull Request #11 <https://github.com/markuskiller/textblob-de/pull/11>_ (thanks @Arttii)

0.4.1 (03/10/2014) ++++++++++++++++++

Docs hosted on RTD <http://textblob-de.readthedocs.org>_
Removed dependency on nltk's depricated PunktWordTokenizer and replaced it with TreebankWordTokenizer see nltk/nltk#746 (comment) <https://github.com/nltk/nltk/pull/746#issuecomment-57625756>_ for details

0.4.0 (17/09/2014) ++++++++++++++++++

Fixed Issue #7 <https://github.com/markuskiller/textblob-de/issues/7>_ (restore textblob>=0.9.0 compatibility)
Depend on nltk3. Vendorized nltk was removed in textblob>=0.9.0
Fixed ImportError on Python2 (unicodecsv)

0.3.1 (29/08/2014) ++++++++++++++++++

Improved PatternParserNPExtractor (less false positives in verb filter)
Made sure that all keyword arguments with default None are checked with is not None
Fixed shortcut to _pattern.de in vendorized library
Added Makefile to facilitate development process
Added docs and API reference

0.3.0 (14/08/2014) ++++++++++++++++++

Fixed Issue #5 <https://github.com/markuskiller/textblob-de/issues/5>_ (text + space + period)

0.2.9 (14/08/2014) ++++++++++++++++++

Fixed tokenization in PatternParser (if initialized manually, punctuation was not always separated from words)
Improved handling of empty strings (Issue #3) and of strings containing single punctuation marks (Issue #4) in PatternTagger and PatternParser
Added tests for empty strings and for strings containing single punctuation marks

0.2.8 (14/08/2014) ++++++++++++++++++

Fixed Issue #3 <https://github.com/markuskiller/textblob-de/issues/3>_ (empty string)
Fixed Issue #4 <https://github.com/markuskiller/textblob-de/issues/4>_ (space + punctuation)

0.2.7 (13/08/2014) ++++++++++++++++++

Fixed Issue #1 <https://github.com/markuskiller/textblob-de/issues/1>_ lemmatization of strings containing a forward slash (/)
Enhancement Issue #2 <https://github.com/markuskiller/textblob-de/issues/2>_ use the same rtype as textblob for sentiment detection.
Fixed tokenization in PatternParserLemmatizer

0.2.6 (04/08/2014) ++++++++++++++++++

Fixed MANIFEST.in for package data in sdist

0.2.5 (04/08/2014) ++++++++++++++++++

sdist is non-functional as important files are missing due to a misconfiguration in MANIFEST.in - does not affect wheels
Major internal refactoring (but no backwards-incompatible API changes) with the aim of restoring complete compatibility to original pattern>=2.6 library on Python2
Separation of textblob and pattern code
On Python2 the vendorized version of pattern.text.de is only used if original is not installed (same as nltk)
Made pattern.de.pprint function and all parser keywords accessible to customise parser output
Access to complete pattern.text.de API on Python2 and Python3 from textblob_de.packages import pattern_de as pd
tox passed on all major platforms (Win/Linux/OSX)

0.2.3 (26/07/2014) ++++++++++++++++++

Lemmatizer: PatternParserLemmatizer() extracts lemmata from Parser output
Improved polarity analysis through look-up of lemmatised word forms

0.2.2 (22/07/2014) ++++++++++++++++++

Option: Include punctuation in tags/pos_tags properties (b = TextBlobDE(text, tagger=PatternTagger(include_punc=True)))
Added BlobberDE() class initialized with German models
TextBlobDE(), Sentence(), WordList() and Word() classes are now all initialized with German models
Restored complete API compatibility with textblob.tokenizers module of the main TextBlob <http://textblob.readthedocs.org/en/dev/>_ library

0.2.1 (20/07/2014) ++++++++++++++++++

Noun Phrase Extraction: PatternParserNPExtractor() extracts NPs from Parser output
Refactored the way TextBlobDE() passes on arguments and keyword arguments to individual tools
Backwards-incompatible: Deprecate parser_show_lemmata=True keyword in TextBlob(). Use parser=PatternParser(lemmata=True) instead.

0.2.0 (18/07/2014) ++++++++++++++++++

vastly improved tokenization (NLTKPunktTokenizer and PatternTokenizer with tests)
consistent use of specified tokenizer for all tools
TextBlobDE with initialized default models for German
Parsing (PatternParser) plus test_parsers.py
EXPERIMENTAL implementation of Polarity detection (PatternAnalyzer)
first attempt at extracting German Polarity clues into de-sentiment.xml
tox tests passing for py26, py27, py33 and py34

0.1.3 (09/07/2014) ++++++++++++++++++

First release on PyPI

0.1.0 - 0.1.2 (09/07/2014) ++++++++++++++++++++++++++

First release on github
A number of experimental releases for testing purposes
Adapted version badges, tests & travis-ci config
Code adapted from sample extension textblob-fr <https://github.com/sloria/textblob-fr>_
Language specific linguistic resources copied from pattern-de <https://github.com/clips/pattern/tree/master/pattern/text/de>_

Keywords

FAQs

What is textblob-de?

Is textblob-de well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install