Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

codeswitch

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

codeswitch

Code Switch is a NLP tool can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.

  • 1.1
  • PyPI
  • Socket score

Maintainers
1

Code Switch

Documentation Status PyPI Version

CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.

Supported Code-Mixed Language

We used LinCE dataset for training multilingual BERT model using huggingface transformers. LinCE has four language mixed data. We took three of it spanish-english, hindi-english and nepali-english. Hope we will train and add other language and task too.

  • Spanish-English(spa-eng)
  • Hindi-English(hin-eng)
  • Nepali-English(nep-eng)

Language Code

  • spa-eng for spanish-english
  • hin-eng for hindi-english
  • nep-eng for nepali-english

Installation

pip install codeswitch

Dependency

  • pytorch >=1.6.0

Training Details

  • All three(lid, ner, pos) sequence tagging model was trainend with huggingface token classification
  • Sentiment Analysis Model trained with huggingface text classification
  • You can find every model and evaluation results here

Features & Supported Language

  • Language Identification
    • spanish-english
    • hindi-english
    • nepali-english
  • POS
    • spanish-english
    • hindi-english
  • NER
    • spanish-english
    • hindi-english
  • Sentiment Analysis
    • spanish-english

Language Identification

from codeswitch.codeswitch import LanguageIdentification
lid = LanguageIdentification('spa-eng') 
# for hindi-english use 'hin-eng', 
# for nepali-english use 'nep-eng'
text = "" # your code-mixed sentence 
result = lid.identify(text)
print(result)

POS Tagging

from codeswitch.codeswitch import POS
pos = POS('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence 
result = pos.tag(text)
print(result)

NER Tagging

from codeswitch.codeswitch import NER
ner = NER('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence 
result = ner.tag(text)
print(result)

Sentiment Analysis

from codeswitch.codeswitch import SentimentAnalysis
sa = SentimentAnalysis('spa-eng')
sentence = "El perro le ladraba a La Gatita .. .. lol #teamlagatita en las playas de Key Biscayne este Memorial day"
result = sa.analyze(sentence)
print(result)
# [{'label': 'LABEL_1', 'score': 0.9587041735649109}]


Acknowledgement

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc