
Security News
Meet Socket at Black Hat and DEF CON 2025 in Las Vegas
Meet Socket at Black Hat & DEF CON 2025 for 1:1s, insider security talks at Allegiant Stadium, and a private dinner with top minds in software supply chain security.
State of the art toolkit for Natural Language Processing in French based on CamemBERT/FlauBERT.
Citation:
@misc{hadoop,
author={Wang Xiaoou},
title={frenchnlp: state of the art toolkit for Natural Language Processing in French based on CamemBERT/FlauBERT},
year={2021},
howpublished={\url{https://github.com/xiaoouwang/frenchnlp}},
}
Wang Xiaoou. (2021). frenchnlp: state of the art toolkit for Natural Language Processing in French based on CamemBERT/FlauBERT. https://github.com/xiaoouwang/frenchnlp.
sentence similarity measure
Reimers, Nils, and Iryna Gurevych. “Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks.” ArXiv:1908.10084 [Cs], August 27, 2019. http://arxiv.org/abs/1908.10084.
Xiaoou Wang, Xingyu Liu, Yimei Yue. “Mesure de similarité textuelle pour l’évaluation automatique de copies d’étudiants.” TALN-RECITAL 2021. Download
to do, text classification pipelines
from frenchnlp import *
from transformers import AutoTokenizer, AutoModel
import torch
compare_compare_cls(model,tokenizer,sentence1,sentence2)
fr_tokenizer = AutoTokenizer.from_pretrained('camembert-base')
fr_model = AutoModel.from_pretrained('camembert-base')
sentences = [
"J'aime les chats.",
"Je déteste les chats.",
"J'adore les chats."
]
for i in range(1,3):
print(f"similarité sémantique entre\n{sentences[0]}\n{sentences[i]}")
print(bert_compare_cls(fr_model,fr_tokenizer,sentences[0],sentences[i]))
Output:
similarité sémantique entre
J'aime les chats.
Je déteste les chats.
0.9145417
similarité sémantique entre
J'aime les chats.
J'adore les chats.
0.9809468
compare_bert_average(model,tokenizer,sent1,sent2)
fr_tokenizer = AutoTokenizer.from_pretrained('camembert-base')
fr_model = AutoModel.from_pretrained('camembert-base')
for i in range(1,3):
print(f"similarité sémantique entre\n{sentences[0]}\n{sentences[i]}")
print(compare_bert_average(fr_model,fr_tokenizer,sentences[0],sentences[i])
Output:
similarité sémantique entre
J'aime les chats.
Je déteste les chats.
0.9145417
similarité sémantique entre
J'aime les chats.
J'adore les chats.
0.9809468
See above for the reference on multilingual sentence embeddings.
compare_sent_transformer(model,sent1,sent2)
from sentence_transformers import SentenceTransformer
sent_model = SentenceTransformer('stsb-xlm-r-multilingual')
for i in range(1,3):
print(f"similarité sémantique entre\n{sentences[0]}\n{sentences[i]}")
print(compare_sent_transformer(sent_model,sentences[0],sentences[i])
Output:
similarité sémantique entre
J'aime les chats.
Je déteste les chats.
0.46124768
similarité sémantique entre
J'aime les chats.
J'adore les chats.
0.9557947
frenchnlp
is licensed under Apache License 2.0. You can use frenchnlp
in your commercial products for free. We would appreciate it if you add a link to frenchnlp
on your website.
Unless otherwise specified, all models in frenchnlp
are licensed under CC BY-NC-SA 4.0.
FAQs
State of the art toolchain for natural language processing in French
We found that frenchnlp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Meet Socket at Black Hat & DEF CON 2025 for 1:1s, insider security talks at Allegiant Stadium, and a private dinner with top minds in software supply chain security.
Security News
CAI is a new open source AI framework that automates penetration testing tasks like scanning and exploitation up to 3,600× faster than humans.
Security News
Deno 2.4 brings back bundling, improves dependency updates and telemetry, and makes the runtime more practical for real-world JavaScript projects.