Research
Security News
Malicious npm Package Targets Solana Developers and Hijacks Funds
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Pretraining and sentiment student to instructor review sentiment corpora and analysis.
Pretraining and sentiment student to instructor review corpora and analysis in Albanian. This repository contains the code base to be used for the paper RoBERTa Low Resource Fine Tuning for Sentiment Analysis in Albanian. To reproduce the results, see the paper reproduction repository. If you use our model or API, please cite our paper.
The library can be installed with pip from the pypi repository:
pip3 install zensols.edusenti
The models are downloaded on the first use of the command-line or API.
Command line:
$ edusenti predict sq.txt
(+): <Per shkak të gjendjes së krijuar si pasojë e pandemisë edhe ne sikur [...]>
(-): <Fillimisht isha e shqetësuar se si do ti mbanim kuizet, si do të [...]>
(+): <Kjo gjendje ka vazhduar edhe në kohën e provimeve>
...
Use the csv
action to write all predictions to a comma-delimited file (use
edusent --help
).
>>> from zensols.edusenti import (
>>> ApplicationFactory, Application, SentimentFeatureDocument
>>> )
>>> app: Application = ApplicationFactory.get_application()
>>> doc: SentimentFeatureDocument
>>> for doc in app.predict(['Kjo gjendje ka vazhduar edhe në kohën e provimeve']):
>>> print(f'sentence: {doc.text}')
>>> print(f'prediction: {doc.pred}')
>>> print(f'prediction: {doc.softmax_logit}')
sentence: Kjo gjendje ka vazhduar edhe në kohën e provimeve
prediction: +
logits: {'+': 0.70292175, '-': 0.17432323, 'n': 0.12275504}
The models are downloaded the first time the API is used. To change the
model (by default xlm-roberta-base
is used) on the command-line, use
--override esi_default.model_namel=xlm-roberta-large
. You can also create a
~/.edusentirc
file with the following:
[esi_default]
model_namel = xlm-roberta-large
Performance of the models on the test set when trained and validated are below.
Model | F1 | Precision | Recall |
---|---|---|---|
xlm-roberta-base | 78.1 | 80.7 | 79.7 |
xlm-roberta-large | 83.5 | 84.9 | 84.7 |
However, the distributed models were trained on the training and test sets
combined. The validation metrics of those trained models are available on the
command line with edusenti info
.
The paper reproduction repository has quite a few differences, mostly around reproducibility. However, this repository is designed to be a package used for research that applies the model. To reproduce the results of the paper, please refer to the reproduction repository. To use the best performing model (XLM-RoBERTa Large) from that paper, then use this repository.
The primary difference is this repo has significantly better performance in Albanian, which climbed from from F1 71.9 to 83.5 (see models). However, this repository has no English sentiment model since it was only used for comparing methods.
Changes include:
See the full documentation. The API reference is also available.
An extensive changelog is available here.
If you use this project in your research please use the following BibTeX entry:
@inproceedings{nuci-etal-2024-roberta-low,
title = "{R}o{BERT}a Low Resource Fine Tuning for Sentiment Analysis in {A}lbanian",
author = "Nuci, Krenare Pireva and
Landes, Paul and
Di Eugenio, Barbara",
editor = "Calzolari, Nicoletta and
Kan, Min-Yen and
Hoste, Veronique and
Lenci, Alessandro and
Sakti, Sakriani and
Xue, Nianwen",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
month = may,
year = "2024",
address = "Torino, Italy",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.1233",
pages = "14146--14151"
}
Copyright (c) 2023 - 2024 Paul Landes and Krenare Pireva Nuci
FAQs
Pretraining and sentiment student to instructor review sentiment corpora and analysis.
We found that zensols.edusenti demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Security News
Research
Socket researchers have discovered malicious npm packages targeting crypto developers, stealing credentials and wallet data using spyware delivered through typosquats of popular cryptographic libraries.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.