You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

short-text-analyzer

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

short-text-analyzer

This Short-Text Analyzer is created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization.

0.1
pipPyPI
Maintainers
1

Short-text-analyzer

This ShortTextAnalyzer was created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization. This topic modeling was done using pre-trained representations of language, namely BERT, combine with the clustering algorithm.

Documentation Page: https://thisisphume.github.io/short-text-analyzer/

Install

pip install short-text-analyzer

Install all the required packages in requirement file.

pip install -r requirements.txt

How to use

from shorttextanalyzer.core import *

analyzer = shortTextAnalyzer(comments_series, 4)
output_result = analyzer.analyze_getResult()

Here we specify that we want 4 clusters/topic from this data.

Output: result

  • sentimentScore: Polarity score ranges from [-1,1] where 1 means positive statement and -1 means a negative statement.
  • Subjective: score ranges from [0,1] where 1 refer to personal opinion, emotion or judgment and 0 means it is factual information.
  • clusterByKMeans: assigned cluster number for each comments using KMeans
  • clusterByHDBSCAN: assigned cluster number for each comments using HDBSCAN
output_result.sample(2)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
commentscomment_langcomments_cleansentimentScoresubjectiveScoreclusterByKMeansclusterByHDBSCAN
50sondage parfaitfrperfect poll1.001.00000021
875it wasn't very clear what the purpose of the f...enit wasn't very clear what the purpose of the f...0.190.41583311

Visualization: how good is our clusters? HDBSCAN and KMeans

analyzer.plot_output()

png

png

Reference

Keywords

BERT NLP short-text topic-modeling clustering

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts