Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
A python package that uses the common location based social network (LBSN) data structure (ProtoBuf) to import, transform and export Social Media data such as Twitter and Flickr.
The goal is to provide a common interface to handle Social Media Data, without the need to individually adapt to the myriad API endpoints available. As an example, consider the ProtoBuf spec lbsn.Post, which can be a Tweet on Twitter, a Photo shared on Flickr, or a post on Reddit. However, all of these objects share a common set of attributes, which is reflected in the lbsnstructure.
The tool is based on a 4-Facet conceptual framework for LBSN, introduced in a paper by Dunkel et al. (2018).
The GDPR directly requests Social Media Network operators to allow users to transfer accounts and data in-between services. While there are attempts by Google, Facebook etc. (e.g. see the data-transfer-project), this is not currently possible. With the lbsnstructure, a primary motivation is to systematically characterize LBSN data aspects in a common, cross-network data scheme that enables privacy-by-design for connected software, data handling and database design.
This tool enables data import from a Postgres database, JSON, or CSV and export to CSV, LBSN ProtoBuf
or the hll and raw versions of the LBSN prepared Postgres Databases.
The tool will map Social Media endpoints (e.g. Twitter tweets) to a common LBSN Interchange Structure
format in ProtoBuf. LBSNTransform can be used using the command line (CLI) or imported to other Python projects with
import lbsntransform
, for on-the-fly conversion.
The recommended way to install lbsntransform, for both Linux and Windows, is through the conda package manager.
environment.yml
First, create an environment with the dependencies for lbsntransform using the [environment.yml][environment.yml] that is provided in the root of the repository.
git clone https://github.com/Sieboldianus/lbsntransform.git
cd lbsntransform
# not necessary, but recommended:
conda config --env --set channel_priority strict
conda env create -f environment.yml
Afterwards, install lbsntransform using pip, without dependencies.
conda activate lbsntransform
pip install lbsntransform --no-deps --upgrade
# or locally, from the latest commits on master
# pip install . --no-deps --upgrade
For each data source, a mapping must be provided that defines how data is mapped to the lbsnstructure.
The default mapping is lbsnraw.
Additional mappings can be dynamically loaded from a folder.
We have provided two example mappings for the Flickr YFCC100M dataset (CSV) and Twitter (json).
For example, to import the first 1000 records from json data from Twitter to the
lbsn raw database, clone field_mapping_twitter.py
to a local folder ./resources/mappings/
, startup the Docker rawdb container,
and use:
lbsntransform --origin 3 \
--mappings_path ./resources/mappings/ \
--file_input \
--file_type "json" \
--mappings_path ./resources/mappings/ \
--dbpassword_output "sample-key" \
--dbuser_output "postgres" \
--dbserveraddress_output "127.0.0.1:5432" \
--dbname_output "rawdb" \
--dbformat_output "lbsn" \
--transferlimit 1000
.. with the above input args, the the tool will:
./01_Input/
Vice versa, to import data directly to the privacy-aware version of lbsnstructure, called hlldb, startup the Docker container, and use:
lbsntransform --origin 3 \
--mappings_path ./resources/mappings/ \
--file_input \
--file_type "json" \
--mappings_path ./resources/mappings/ \
--dbpassword_output "sample-key" \
--dbuser_output "postgres" \
--dbserveraddress_output "127.0.0.1:25432" \
--dbname_output "hlldb" \
--dbformat_output "hll" \
--dbpassword_hllworker "sample-key" \
--dbuser_hllworker "postgres" \
--dbserveraddress_hllworker "127.0.0.1:25432" \
--dbname_hllworker "hlldb" \
--include_lbsn_objects "origin,post" \
--include_lbsn_bases hashtag,place,date,community \
--transferlimit 1000
.. with the above input args, the the tool will:
./01_Input/
hashtag
, place
, date
and community
A full list of possible input and output args is available in the documentation.
See also the list of contributors.
This project is licensed under the GNU GPLv3 or any higher - see the LICENSE.md file for details.
FAQs
Location based social network (LBSN) data structure format & transfer tool
We found that lbsntransform demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.