Socket
Socket
Sign inDemoInstall

languageflow

Package Overview
Dependencies
7
Maintainers
1
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    languageflow

Data loaders and abstractions for text and NLP


Maintainers
1

Readme

============ LanguageFlow

.. image:: https://img.shields.io/pypi/v/languageflow.svg :target: https://pypi.python.org/pypi/languageflow

.. image:: https://img.shields.io/pypi/pyversions/languageflow.svg :target: https://pypi.python.org/pypi/languageflow

.. image:: https://img.shields.io/badge/license-GNU%20General%20Public%20License%20v3-brightgreen.svg :target: https://pypi.python.org/pypi/languageflow

.. image:: https://img.shields.io/travis/undertheseanlp/languageflow.svg :target: https://travis-ci.org/undertheseanlp/languageflow

.. image:: https://readthedocs.org/projects/languageflow/badge/?version=latest :target: http://languageflow.readthedocs.io/en/latest/ :alt: Documentation Status

Data loaders and abstractions for text and NLP

Requirements

Install dependencies

.. code-block:: bash

$ pip install future, tox
$ pip install python-crfsuite==0.9.5
$ pip install Cython
$ pip install -U fasttext --no-cache-dir --no-deps --force-reinstall
$ pip install xgboost==0.82

Installation

.. code-block:: bash

  $ pip install languageflow

Components

  • Transformers: NumberRemover, CountVectorizer, TfidfVectorizer
  • Models: SGDClassifier, XGBoostClassifier, KimCNNClassifier, FastTextClassifier, CRF

Data

Download a dataset using download command

.. code-block:: bash

$ languageflow download DATASET

List all dataset

.. code-block:: bash

$ languageflow list

Datasets


The datasets module currently contains:

* Tagged: VLSP2018-NER, VTB-CHUNK*, VLSP2016-NER*, VLSP2013-POS*, VLSP2013-WTK*
* Categorized: AIVIVN2019_SA*, VLSP2018_SA*, UTS2017_BANK, VLSP2016_SA*, VNTC
* Plaintext: VNESES, VNTQ_SMALL, VNTQ_BIG

Caution (*): With closed license dataset, you must provide URL to download

Example

Download UTS2017_BANK dataset

.. code-block:: bash

$ languageflow download UTS2017_BANK

Use UTS2017_BANK dataset

.. code-block:: python

>>> from languageflow.data_fetcher import DataFetcher, NLPData
>>> corpus = DataFetcher.load_corpus(NLPData.UTS2017_BANK_SA)
>>> print(corpus)
CategorizedCorpus: 1780 train + 197 dev + 494 test sentences

======= History

1.1.7 (2018-04-12)

  • Automatic deploy with travis and pypi
  • Fix dependencies hell

1.1.6 (2017-12-26)

  • Add data module to handle data downloading and data preprocessing
  • Add many new models: SGDClassifier, XGBoostClassier, FastTextClassifier, CRF
  • Add new feature: LanguageBoard
  • Automatic continuous integration with travis-ci
  • Build docs with readthedocs.org

1.1.5 (2017-12-11)

  • Refactor project to integrate with underthesea experiment

0.1.0 (2017-09-18)

  • First release on PyPI.

Keywords

FAQs


Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc