Code Switch
CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.
Supported Code-Mixed Language
We used LinCE dataset for training multilingual BERT model using huggingface transformers. LinCE
has four language mixed data. We took three of it spanish-english
, hindi-english
and nepali-english
. Hope we will train and add other language and task too.
- Spanish-English(spa-eng)
- Hindi-English(hin-eng)
- Nepali-English(nep-eng)
Language Code
spa-eng
for spanish-englishhin-eng
for hindi-englishnep-eng
for nepali-english
Installation
pip install codeswitch
Dependency
Training Details
- All three(lid, ner, pos) sequence tagging model was trainend with huggingface token classification
- Sentiment Analysis Model trained with huggingface text classification
- You can find every model and evaluation results here
Features & Supported Language
- Language Identification
- spanish-english
- hindi-english
- nepali-english
- POS
- spanish-english
- hindi-english
- NER
- spanish-english
- hindi-english
- Sentiment Analysis
Language Identification
from codeswitch.codeswitch import LanguageIdentification
lid = LanguageIdentification('spa-eng')
text = ""
result = lid.identify(text)
print(result)
POS Tagging
from codeswitch.codeswitch import POS
pos = POS('spa-eng')
text = ""
result = pos.tag(text)
print(result)
NER Tagging
from codeswitch.codeswitch import NER
ner = NER('spa-eng')
text = ""
result = ner.tag(text)
print(result)
Sentiment Analysis
from codeswitch.codeswitch import SentimentAnalysis
sa = SentimentAnalysis('spa-eng')
sentence = "El perro le ladraba a La Gatita .. .. lol #teamlagatita en las playas de Key Biscayne este Memorial day"
result = sa.analyze(sentence)
print(result)
Acknowledgement