
Research
SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.
pysentimiento
Advanced tools
A Transformer-based library for SocialNLP tasks.
Currently supports:
| Task | Languages |
|---|---|
| Sentiment Analysis | es, en, it, pt |
| Hate Speech Detection | es, en, it, pt |
| Irony Detection | es, en, it, pt |
| Emotion Analysis | es, en, it, pt |
| NER & POS tagging | es, en |
| Contextualized Hate Speech Detection | es |
| Targeted Sentiment Analysis | es |
Just do pip install pysentimiento and start using it:
from pysentimiento import create_analyzer
analyzer = create_analyzer(task="sentiment", lang="es")
analyzer.predict("Qué gran jugador es Messi")
# returns AnalyzerOutput(output=POS, probas={POS: 0.998, NEG: 0.002, NEU: 0.000})
analyzer.predict("Esto es pésimo")
# returns AnalyzerOutput(output=NEG, probas={NEG: 0.999, POS: 0.001, NEU: 0.000})
analyzer.predict("Qué es esto?")
# returns AnalyzerOutput(output=NEU, probas={NEU: 0.993, NEG: 0.005, POS: 0.002})
analyzer.predict("jejeje no te creo mucho")
# AnalyzerOutput(output=NEG, probas={NEG: 0.587, NEU: 0.408, POS: 0.005})
"""
Emotion Analysis in English
"""
emotion_analyzer = create_analyzer(task="emotion", lang="en")
emotion_analyzer.predict("yayyy")
# returns AnalyzerOutput(output=joy, probas={joy: 0.723, others: 0.198, surprise: 0.038, disgust: 0.011, sadness: 0.011, fear: 0.010, anger: 0.009})
emotion_analyzer.predict("fuck off")
# returns AnalyzerOutput(output=anger, probas={anger: 0.798, surprise: 0.055, fear: 0.040, disgust: 0.036, joy: 0.028, others: 0.023, sadness: 0.019})
"""
Hate Speech (misogyny & racism)
"""
hate_speech_analyzer = create_analyzer(task="hate_speech", lang="es")
hate_speech_analyzer.predict("Esto es una mierda pero no es odio")
# returns AnalyzerOutput(output=[], probas={hateful: 0.022, targeted: 0.009, aggressive: 0.018})
hate_speech_analyzer.predict("Esto es odio porque los inmigrantes deben ser aniquilados")
# returns AnalyzerOutput(output=['hateful'], probas={hateful: 0.835, targeted: 0.008, aggressive: 0.476})
hate_speech_analyzer.predict("Vaya guarra barata y de poca monta es XXXX!")
# returns AnalyzerOutput(output=['hateful', 'targeted', 'aggressive'], probas={hateful: 0.987, targeted: 0.978, aggressive: 0.969})
See TASKS for more details on the supported tasks and languages, and also for reported performance for each benchmarked model.
Also, check these notebooks with examples of how to use pysentimiento for each language:
pysentimiento features a tweet preprocessor specially suited for tweet classification with transformer-based models.
from pysentimiento.preprocessing import preprocess_tweet
# Replaces user handles and URLs by special tokens
preprocess_tweet("@perezjotaeme debería cambiar esto http://bit.ly/sarasa") # "@usuario debería cambiar esto url"
# Shortens repeated characters
preprocess_tweet("no entiendo naaaaaaaadaaaaaaaa", shorten=2) # "no entiendo naadaa"
# Normalizes laughters
preprocess_tweet("jajajajaajjajaajajaja no lo puedo creer ajajaj") # "jaja no lo puedo creer jaja"
# Handles hashtags
preprocess_tweet("esto es #UnaGenialidad")
# "esto es una genialidad"
# Handles emojis
preprocess_tweet("🎉🎉", lang="en")
# 'emoji party popper emoji emoji party popper emoji'
git clone https://github.com/pysentimiento/pysentimiento
pip install poetry
poetry shell
poetry install
Check TRAIN.md for further information on how to train your models
Note: you need access to the datasets, which are not public for the time being. Send us an email to get access to them.
Check "Model sharing and upload" instructions in huggingface docs.
pysentimiento is an open-source library. However, please be aware that models are trained with third-party datasets and are subject to their respective licenses, many of which are for non-commercial use
TASS Dataset license (License for Sentiment Analysis in Spanish, Emotion Analysis in Spanish & English)
SEMEval 2017 Dataset license (Sentiment Analysis in English)
LinCE Datasets (License for NER & POS tagging)
Please use the repository issue tracker to point out bugs and make suggestions (new models, use another datasets, some other languages, etc)
If you use pysentimiento in your work, please cite this paper
@misc{perez2021pysentimiento,
title={pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks},
author={Juan Manuel Pérez and Mariela Rajngewerc and Juan Carlos Giudici and Damián A. Furman and Franco Luque and Laura Alonso Alemany and María Vanina Martínez},
year={2023},
eprint={2106.09462},a
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Also, pleace cite related pre-trained models and datasets for the specific models you use. Check REFERENCES for details.
FAQs
A Transformer-based library for SocialNLP tasks
We found that pysentimiento demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.

Company News
Socket is proud to join the OpenJS Foundation as a Silver Member, deepening our commitment to the long-term health and security of the JavaScript ecosystem.

Security News
npm now links to Socket's security analysis on every package page. Here's what you'll find when you click through.