You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

frenchnlp

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

frenchnlp

State of the art toolchain for natural language processing in French

0.2.3

PyPI

Maintainers: 1

French NLP Toolkit

State of the art toolkit for Natural Language Processing in French based on CamemBERT/FlauBERT.

Citation:

@misc{hadoop,
  author={Wang Xiaoou},
  title={frenchnlp: state of the art toolkit for Natural Language Processing in French based on CamemBERT/FlauBERT},
  year={2021},
  howpublished={\url{https://github.com/xiaoouwang/frenchnlp}},
}

Wang Xiaoou. (2021). frenchnlp: state of the art toolkit for Natural Language Processing in French based on CamemBERT/FlauBERT. https://github.com/xiaoouwang/frenchnlp.

sentence similarity measure
- For why average pooling/[cls] shouldn't be used to represent sentence, see
Reimers, Nils, and Iryna Gurevych. “Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks.” ArXiv:1908.10084 [Cs], August 27, 2019. http://arxiv.org/abs/1908.10084.
- For use of sentence similarity in real life, see
Xiaoou Wang, Xingyu Liu, Yimei Yue. “Mesure de similarité textuelle pour l’évaluation automatique de copies d’étudiants.” TALN-RECITAL 2021. Download
to do, text classification pipelines

How to use the package

from frenchnlp import *
from transformers import AutoTokenizer, AutoModel
import torch

Transformer-based sentence similarity measure (using CamemBERT as example)

Using the [cls] token

compare_compare_cls(model,tokenizer,sentence1,sentence2)

fr_tokenizer = AutoTokenizer.from_pretrained('camembert-base')
fr_model = AutoModel.from_pretrained('camembert-base')

sentences = [
    "J'aime les chats.",
    "Je déteste les chats.",
    "J'adore les chats."
]

for i in range(1,3):
    print(f"similarité sémantique entre\n{sentences[0]}\n{sentences[i]}")
    print(bert_compare_cls(fr_model,fr_tokenizer,sentences[0],sentences[i]))

Output:

similarité sémantique entre
J'aime les chats.
Je déteste les chats.
0.9145417
similarité sémantique entre
J'aime les chats.
J'adore les chats.
0.9809468

Average pooling

compare_bert_average(model,tokenizer,sent1,sent2)

fr_tokenizer = AutoTokenizer.from_pretrained('camembert-base')
fr_model = AutoModel.from_pretrained('camembert-base')

for i in range(1,3):
    print(f"similarité sémantique entre\n{sentences[0]}\n{sentences[i]}")
    print(compare_bert_average(fr_model,fr_tokenizer,sentences[0],sentences[i])

Output:

similarité sémantique entre
J'aime les chats.
Je déteste les chats.
0.9145417
similarité sémantique entre
J'aime les chats.
J'adore les chats.
0.9809468

Using multilingual sentence embeddings

See above for the reference on multilingual sentence embeddings.

compare_sent_transformer(model,sent1,sent2)

from sentence_transformers import SentenceTransformer

sent_model = SentenceTransformer('stsb-xlm-r-multilingual')

for i in range(1,3):
    print(f"similarité sémantique entre\n{sentences[0]}\n{sentences[i]}")
    print(compare_sent_transformer(sent_model,sentences[0],sentences[i])

Output:

similarité sémantique entre
J'aime les chats.
Je déteste les chats.
0.46124768
similarité sémantique entre
J'aime les chats.
J'adore les chats.
0.9557947

License

Codes

frenchnlp is licensed under Apache License 2.0. You can use frenchnlp in your commercial products for free. We would appreciate it if you add a link to frenchnlp on your website.

Models

Unless otherwise specified, all models in frenchnlp are licensed under CC BY-NC-SA 4.0.

Keywords

FAQs

What is frenchnlp?

Is frenchnlp well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

frenchnlp

French NLP Toolkit

How to use the package

Transformer-based sentence similarity measure (using CamemBERT as example)

Using the [cls] token

Average pooling

Using multilingual sentence embeddings

License

Codes

Models

Keywords

Related posts

Open Source CAI Framework Handles Pen Testing Tasks up to 3,600× Faster Than Humans

Deno 2.4 Brings Back deno bundle, Improves Dependency Management and Observability