This repository has been archived!
You can find the latest version of the source code inside the CMTT repository, where it will continue to be developed.
CMTT is a wrapper library that makes code-mixed text processing more efficient than ever. More documentation incoming!
Installation
pip install code-mixed-text-toolkit
Get started
How to use this library:
import code_mixed_text_toolkit.data as cmtt_data
import code_mixed_text_toolkit.preprocessing as cmtt_pp
result_json = cmtt_data.load('https://world.openfoodfacts.org/api/v0/product/5060292302201.json')
result_csv = cmtt_data.load('https://gist.githubusercontent.com/rnirmal/e01acfdaf54a6f9b24e91ba4cae63518/raw/b589a5c5a851711e20c5eb28f9d54742d1fe2dc/datasets.csv')
keys = cmtt_data.list_dataset_keys()
data = cmtt_data.list_cmtt_datasets()
print(data)
lst = cmtt_data.download_cmtt_datasets(["linc_ner_hineng", "L3Cube_HingLID_all", "linc_lid_spaeng"])
path = cmtt_data.download_dataset_url('https://world.openfoodfacts.org/api/v0/product/5060292302201.json')
result_txt = cmtt_data.load('https://www.w3.org/TR/PNG/iso_8859-1.txt')
result_txt_tokenized = cmtt_pp.tokenizer.word_tokenize(result_txt)
cmtt_pp.search.search_word(result_txt, 'with', tokenize = True, width = 3)
Contributors