Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

SCBert

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

SCBert

A small package to do Sentence Clustering with BERT (SCBert)

  • 0.2
  • PyPI
  • Socket score

Maintainers
1

Sentence Clustering with BERT (SCB)

Sentence Clustering with BERT project which aim to use state-of-the-art BERT models to compute vectors for sentences. A few tools are also implemented to explore those vectors and how sentences are related to each others in the latent space.

Demonstration

  • Create vectors from raw data :
#How to transform raw french texts into vectors using BERT model. 
from SCBert.SCBert import Vectorizer

vectorizer = Vectorizer("flaubert")
text_vectors = vectorizer.vectorize(data)
  • Explore the embedded space :
#How to explore the relation in your data. 
from SCBert.SCBert import EmbeddingExplorer

ee = EmbeddingExplorer(data,text_vectors)
labels = ee.cluster(k=3)                     #Cluster with k-means 
ee.extract_keywords()                        #Extract keywords using Rake algorithm, then accessible with ee.keywords
ee.explore(color = labels)                   #Generate a plot with PCA of the embedded vectors with colors corresponding to the labels 

Built-in example

There is a built-in example that you can find in the example folder. It comes with it's own data which is the CLS-fr composed of Amazon reviews from different sources (DVD, CD, Livres)

Installation

You can either download the zip file or use the Pypi package that you can install with the following command :

> pip install SCBert

If you encounter problems during the installation it may be because of the multi-rake dependy with cld2-cffi. I will try to address this later on. To bypass, just follow the instructions :

> export CFLAGS="-Wno-narrowing"
> pip install cld2-cffi
> pip install multi-rake

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc