📝 nlplot
nlplot: Analysis and visualization module for Natural Language Processing 📈
Description
Facilitates the visualization of natural language processing and provides quicker analysis
You can draw the following graph
- N-gram bar chart
- N-gram tree Map
- Histogram of the word count
- wordcloud
- co-occurrence networks
- sunburst chart
(Tested in English and Japanese)
Requirement
Installation
pip install nlplot
I've posted on this blog about the specific use. (Japanese)
And, The sample code is also available in the kernel of kaggle. (English)
Quick start - Data Preparation
The column to be analyzed must be a space-delimited string
target_col = "text"
texts = [
"Think rich look poor",
"When you come to a roadblock, take a detour",
"When it is dark enough, you can see the stars",
"Never let your memories be greater than your dreams",
"Victory is sweetest when you’ve known defeat"
]
df = pd.DataFrame({target_col: texts})
df.head()
| text |
---|
0 | Think rich look poor |
1 | When you come to a roadblock, take a detour |
2 | When it is dark enough, you can see the stars |
3 | Never let your memories be greater than your dreams |
4 | Victory is sweetest when you’ve known defeat |
Quick start - Python API
import nlplot
import pandas as pd
import plotly
from plotly.subplots import make_subplots
from plotly.offline import iplot
import matplotlib.pyplot as plt
%matplotlib inline
npt = nlplot.NLPlot(df, target_col='text')
stopwords = npt.get_stopword(top_n=30, min_freq=0)
fig_unigram = npt.bar_ngram(
title='uni-gram',
xaxis_label='word_count',
yaxis_label='word',
ngram=1,
top_n=50,
width=800,
height=1100,
color=None,
horizon=True,
stopwords=stopwords,
verbose=False,
save=False,
)
fig_unigram.show()
fig_bigram = npt.bar_ngram(
title='bi-gram',
xaxis_label='word_count',
yaxis_label='word',
ngram=2,
top_n=50,
width=800,
height=1100,
color=None,
horizon=True,
stopwords=stopwords,
verbose=False,
save=False,
)
fig_bigram.show()
fig_treemap = npt.treemap(
title='Tree map',
ngram=1,
top_n=50,
width=1300,
height=600,
stopwords=stopwords,
verbose=False,
save=False
)
fig_treemap.show()
fig_histgram = npt.word_distribution(
title='word distribution',
xaxis_label='count',
yaxis_label='',
width=1000,
height=500,
color=None,
template='plotly',
bins=None,
save=False,
)
fig_histgram.show()
fig_wc = npt.wordcloud(
width=1000,
height=600,
max_words=100,
max_font_size=100,
colormap='tab20_r',
stopwords=stopwords,
mask_file=None,
save=False
)
plt.figure(figsize=(15, 25))
plt.imshow(fig_wc, interpolation="bilinear")
plt.axis("off")
plt.show()
npt.build_graph(stopwords=stopwords, min_edge_frequency=10)
fig_co_network = npt.co_network(
title='Co-occurrence network',
sizing=100,
node_size='adjacency_frequency',
color_palette='hls',
width=1100,
height=700,
save=False
)
iplot(fig_co_network)
fig_sunburst = npt.sunburst(
title='sunburst chart',
colorscale=True,
color_continuous_scale='Oryel',
width=1000,
height=800,
save=False
)
fig_sunburst.show()
display(
npt.node_df.head(), npt.node_df.shape,
npt.edge_df.head(), npt.edge_df.shape
)
Document
TBD
Test
cd tests
pytest
Other
-
Plotly is used to plot the figure
-
co-occurrence networks is used to calculate the co-occurrence network
-
wordcloud uses the following fonts