Research
Security News
Malicious PyPI Package ‘pycord-self’ Targets Discord Developers with Token Theft and Backdoor Exploit
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
A Python tool for generating a word cloud for a Facebook chat conversation.
.. image:: https://travis-ci.org/mjmeli/facebook-chat-word-cloud.svg?branch=master :target: https://travis-ci.org/mjmeli/facebook-chat-word-cloud
A Python tool for generating a word cloud for a Facebook chat conversation.
This uses lxml
to parse the messages file provided by Facebook. This requires libxml2
and libxslt
to be installed.
For Debian/Ubuntu:
sudo apt-get install libxml2-dev libxslt-dev python-dev
This also uses Pillow
to handle image manipulation. This requires libjpeg
, zlib
, and libfreetype
:
sudo apt-get install libjpeg-dev zlib1g-dev libfreetype6-dev
pip install facebook_wordcloud
git clone https://github.com/mjmeli/facebook-chat-word-cloud.git
pip install -e .
python setup.py test
Request your Facebook data archive and get the messages.htm file.
Generate default word cloud:
facebook_wordcloud examples/messages_sample.htm "Foo Bar"
Use a configuration file for customization:
facebook_wordcloud examples/messages_sample.htm "Foo Bar" -c config.json
Use the sample conversation file for quick testing:
facebook_wordcloud examples/messages_sample.htm "Foo Bar" -sample
Output the word cloud to an image
facebook_wordcloud examples/messages_sample.htm "Foo Bar" -o output.png
This is essentially a command line wrapper around the Andreas Muller's <https://github.com/amueller>
_ (amueller) word_cloud Python library <https://github.com/amueller/word_cloud>
_ . This simply parses Facebook messages and passes data to that library.
First step is to get your Facebook messages archive:
The script is easy to use:
facebook_wordcloud [messages_file] [users] {optional arguments}
Where,
There are a few important optional arguments:
There are many more arguments that mainly allow you to change the configuration of the word cloud. However, all of these arguments can be specified in the json configuration file. It will be much easier to use a config file! If you are stubborn, use the "-h" or "--help" option to see all the arguments.
IMPORTANT: Command line arguments override config files!
The messages file downloaded from Facebook will probably be quite large (mine was 60 MB). It may take a while to parse, which can get annoying when you are making small changes to get a nice looking word cloud. I highly recommend using the sample conversation I provide as this will parse in seconds and has very high word density. You can either directly reference this file (examples/messages_sample.htm with user "Foo Bar") or just use the "-sample" option with the command
See the examples directory for some great examples of what you can do and some more description on the topic of customization.
.. image:: http://i.imgur.com/cKP4nJB.png
.. image:: http://i.imgur.com/7Q4bjdY.png
.. image:: http://i.imgur.com/2E9HRF5.png
.. image:: http://i.imgur.com/JDYoVxm.png
.. image:: http://i.imgur.com/UXIGvLW.png
I originally used BeautifulSoup and then switched to the lxml parser. This is slightly annoying because it requires system libraries, but the performance is significantly better. See the benchmarks below from attempting to analyze a 60 MB file:
+---------------+-------------------------+-------------------+ | Parser | Build Tree Runtime (ms) | Max Memory Usage | +===============+=========================+===================+ | BeautifulSoup | 90750 | 3450 MB (3.45 GB) | +---------------+-------------------------+-------------------+ | lxml | 1945 | 910 MB (0.91 GB) | +---------------+-------------------------+-------------------+
ImportError: The _imagingft C module is not installed
This means you don't have libfreetype
installed. See the Requirements section. If installing it does not work, you may have to uninstall and reinstall Pillow
via pip
.
IOError: Couldn't locate mask file...did you make sure to specify the URL relative to where you are running the script?
This error is self-explanatory. In masked/config.json
, the mask file is specified with a relative URL. This URL is relative to where you are running the script. I wrote the config file assuming that you were running the facebook_wordcloud
in the /examples
directory. If this is not the case, then either cd
into that directory, or adjust the path in masked/config.json
.
The mask doesn't seem to be working? I ran into this issue a few times. Make sure the mask is either in RGB or grayscale. Note that only parts that are pure white (#FFFFFF) will be removed.
FAQs
A Python tool for generating a word cloud for a Facebook chat conversation.
We found that facebook_wordcloud demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.
Security News
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.