============
LanguageFlow
.. image:: https://img.shields.io/pypi/v/languageflow.svg
:target: https://pypi.python.org/pypi/languageflow
.. image:: https://img.shields.io/pypi/pyversions/languageflow.svg
:target: https://pypi.python.org/pypi/languageflow
.. image:: https://img.shields.io/badge/license-GNU%20General%20Public%20License%20v3-brightgreen.svg
:target: https://pypi.python.org/pypi/languageflow
.. image:: https://img.shields.io/travis/undertheseanlp/languageflow.svg
:target: https://travis-ci.org/undertheseanlp/languageflow
.. image:: https://readthedocs.org/projects/languageflow/badge/?version=latest
:target: http://languageflow.readthedocs.io/en/latest/
:alt: Documentation Status
Data loaders and abstractions for text and NLP
Requirements
Install dependencies
.. code-block:: bash
$ pip install future, tox
$ pip install python-crfsuite==0.9.5
$ pip install Cython
$ pip install -U fasttext --no-cache-dir --no-deps --force-reinstall
$ pip install xgboost==0.82
Installation
.. code-block:: bash
$ pip install languageflow
Components
- Transformers: NumberRemover, CountVectorizer, TfidfVectorizer
- Models: SGDClassifier, XGBoostClassifier, KimCNNClassifier, FastTextClassifier, CRF
Data
Download a dataset using download command
.. code-block:: bash
$ languageflow download DATASET
List all dataset
.. code-block:: bash
$ languageflow list
Datasets
The datasets module currently contains:
* Tagged: VLSP2018-NER, VTB-CHUNK*, VLSP2016-NER*, VLSP2013-POS*, VLSP2013-WTK*
* Categorized: AIVIVN2019_SA*, VLSP2018_SA*, UTS2017_BANK, VLSP2016_SA*, VNTC
* Plaintext: VNESES, VNTQ_SMALL, VNTQ_BIG
Caution (*): With closed license dataset, you must provide URL to download
Example
Download UTS2017_BANK
dataset
.. code-block:: bash
$ languageflow download UTS2017_BANK
Use UTS2017_BANK
dataset
.. code-block:: python
>>> from languageflow.data_fetcher import DataFetcher, NLPData
>>> corpus = DataFetcher.load_corpus(NLPData.UTS2017_BANK_SA)
>>> print(corpus)
CategorizedCorpus: 1780 train + 197 dev + 494 test sentences
=======
History
1.1.7 (2018-04-12)
- Automatic deploy with travis and pypi
- Fix dependencies hell
1.1.6 (2017-12-26)
- Add data module to handle data downloading and data preprocessing
- Add many new models: SGDClassifier, XGBoostClassier, FastTextClassifier, CRF
- Add new feature: LanguageBoard
- Automatic continuous integration with travis-ci
- Build docs with readthedocs.org
1.1.5 (2017-12-11)
- Refactor project to integrate with underthesea experiment
0.1.0 (2017-09-18)