You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

shorttextanalyzer

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

shorttextanalyzer

This Short-Text Analyzer is created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization.

0.1.1
PyPI
Maintainers
1

Short-text-analyzer

This ShortTextAnalyzer was created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization. This topic modeling was done using pre-trained representations of language, namely BERT, combine with the clustering algorithm.

Documentation Page: https://thisisphume.github.io/short-text-analyzer/

Install

pip install short-text-analyzer

Install all the required packages from the requirement.txt file.

pip install -r requirements.txt

from shorttextanalyzer.core import *

How to use

analyzer = shortTextAnalyzer(comments_series, 4)
output_result = analyzer.analyze_getResult()
Embedding Method for Visualization is  2AE  with MSE of 0.6560611658549391
Embedding Method for Clustering is  2AE  with MSE of 0.4782262679093038
Number of clusters via HDBSCAN is:  5.0
Number of clusters via KMeans is:   4

Here we specify that we want 4 clusters/topic from this data.

Output: result

  • sentimentScore: Polarity score ranges from [-1,1] where 1 means positive statement and -1 means a negative statement.
  • Subjective: score ranges from [0,1] where 1 refer to personal opinion, emotion or judgment and 0 means it is factual information.
  • clusterByKMeans: assigned cluster number for each comments using KMeans
  • clusterByHDBSCAN: assigned cluster number for each comments using HDBSCAN
output_result.sample(2)
commentscomment_langcomments_cleansentimentScoresubjectiveScoreclusterByKMeansclusterByHDBSCAN
50sondage parfaitfrperfect poll1.001.00000021
875it wasn't very clear what the purpose of the f...enit wasn't very clear what the purpose of the f...0.190.41583311

Visualization: how good is our clusters? HDBSCAN and KMeans

analyzer.plot_output()

png

png

Reference

Keywords

BERT NLP short-text topic-modeling clustering

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts