Research
Security News
Quasar RAT Disguised as an npm Package for Detecting Vulnerabilities in Ethereum Smart Contracts
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
swordcloud
: A semantic word cloud generator that uses t-SNE and k-means clustering to visualize words in high-dimensional semantic space. Based on A. Mueller's wordcloud
module, swordcloud
can generate semantic word clouds from Thai and English texts based on any word vector models.
SemanticWordCloud
instanceswordcloud
can be installed using pip
:
pip install swordcloud
Optionally, if you want to be able to embed fonts directly into the generated SVGs, an embedfont
extra can also be specified:
pip install swordcloud[embedfont]
As of version 0.0.10, the exact list of dependencies is as follow:
python >= 3.8
numpy >= 1.21.0
pillow
matplotlib >= 1.5.3
gensim >= 4.0.0
pandas
pythainlp >= 3.1.0
k-means-constrained
scikit-learn
fonttools
All code below can also be found in the example folder.
SemanticWordCloud
instanceFor most use cases, the SemanticWordCloud
class is the main API the users will be interacting with.
from swordcloud import SemanticWordCloud
# See the `Color "Functions"` section for detail about these color functions
from swordcloud.color_func import SingleColorFunc
wordcloud = SemanticWordCloud(
language = 'TH',
width = 1600,
height = 800,
max_font_size = 150,
prefer_horizontal = 1,
color_func = SingleColorFunc('black')
)
Please refer to the documentation in src/swordcloud/wordcloud.py or in your IDE for more detail about various options available for customizing the word cloud.
# Can also be one large string instead of a list of strings
raw_text = list(map(str.strip, open('raw_text.txt', encoding='utf-8')))
wordcloud.generate_from_text(raw_text, random_state=42)
freq = {}
for line in open("word_frequencies.tsv", encoding="utf-8"):
word, count = line.strip().split('\t')
freq[word] = int(count)
wordcloud.generate_from_frequencies(freq, random_state=42)
from swordcloud.color_func import FrequencyColorFunc
wordcloud = SemanticWordCloud(
language = 'TH',
# make sure the canvas is appropriately large for the number of clusters
width = 2400,
height = 1200,
max_font_size = 150,
prefer_horizontal = 1
)
wordcloud.generate_from_text(raw_text, kmeans=6, random_state=42, plot_now=False)
# Or directly from `generate_kmeans_cloud` if you already have word frequencies
wordcloud.generate_kmeans_cloud(freq, n_clusters=6, random_state=42, plot_now=False)
# Each sub cloud can then be individually interacted with
# by accessing individual cloud in `sub_clouds` attribute
for cloud, color in zip(wordcloud.sub_clouds, ["red", "blue", "brown", "green", "black", "orange"]):
cloud.recolor(FrequencyColorFunc(color), plot_now=False)
cloud.show()
# If the generated colors are not to your liking
# We can recolor them instead of re-generating the whole cloud
from swordcloud.color_func import RandomColorFunc
wordcloud.recolor(RandomColorFunc, random_state=42)
pillow
's Image
img = wordcloud.to_image()
wordcloud.to_file('wordcloud.png')
# Without embedded font
svg = wordcloud.to_svg()
# With embedded font
svg = wordcloud.to_svg(embed_font=True)
# Note that in order to be able to embed fonts
# the `fonttools` package needs to be installed
numpy
's image arrayarray = wordcloud.to_array()
A number of built-in color "functions" can be accessed from swordcloud.color_func
:
from swordcloud.color_func import <your_color_function_here>
The list of available functions is as follow:
RandomColorFunc
(Default)ColorMapFunc
matplotlib
's colormap.ImageColorFunc
SingleColorFunc
ExactColorFunc
FrequencyColorFunc
All the above functions, except RandomColorFunc
which cannot be customized further, must be initialized before passing them to the SemanticWordCloud
class. For example:
from swordcloud.color_func import ColorMapFunc
color_func = ColorMapFunc("magma")
wordcloud = SemanticWordCloud(
...
color_func = color_func
...
)
Users can also implement their own color functions, provided that they are callable with the following signature:
Input:
word: str
frequency: float
font_size: int
position: tuple[int, int]
orientation: PIL.Image.Transpose | None
pillow
's orientation.font_path: str
random_state: random.Random
random.Random
objectReturn:
Any object that can be interpreted as a color by pillow
. See pillow
's documentation for more detail.
Internally, arguments to color functions are always passed as keyword arguments so they can be in any order. However, if your functions only use some of them, make sure to include **kwargs
at the end of your function headers so that other arguments do not cause an error.
FAQs
Semantic word cloud package for Thai and English
We found that swordcloud demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover a malicious npm package posing as a tool for detecting vulnerabilities in Etherium smart contracts.
Security News
Research
A supply chain attack on Rspack's npm packages injected cryptomining malware, potentially impacting thousands of developers.
Research
Security News
Socket researchers discovered a malware campaign on npm delivering the Skuld infostealer via typosquatted packages, exposing sensitive data.