Research
Security News
Malicious npm Package Targets Solana Developers and Hijacks Funds
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
github.com/josh9191/gcp-twitter-sentiment-analysis
Create Twitter sentiment Inference pipeline from scratch using Google Cloud Platform.
This project aims to mimic data ingestion and create inference pipeline using whth GCP (Google Cloud Platform).
The project contains 3 modules.
This project is tested on Python 3.8.5 (Training, Local Dataflow Run), Go 1.13 (Cloud Functions).
Other environments should be tested later.
You can train your own model using train/train.ipynb file.
This requires tensorflow 2.3.1, tweet-preprocessor 0.6.0, pandas 1.1.4. The installation script is included in the Notebook file.
The output model will be created in train-model/model/ directory as Tensorflow SavedModel format. The Dataflow pipeline retrieves model data from Cloud Storage so you should store the model in the Storage.
You can easily create a bucket and upload your model directory to the bucket as follows.
# you may need to authenticate to GCP
gcloud auth login
# create a bucket "some-bucket"
gsutil mb -c standard gs://some-bucket
# copy directory "train-model/model/" to the bucket
gsutil cp -R train-model/model/ gs://some-bucket
# check the directory has been uploaded to Cloud Storage
gsutil ls gs://some-bucket/train/model
You can create a cronjob using Cloud Scheduler. Cloud Scheduler publishes to Pub/Sub topic and Cloud Functions will subscribe that topic.
You can create Pub/Sub topic and Cloud Scheduler using gcloud tool.
# create a topic named "cron-topic"
gcloud pubsub topics create cron-topic
# create a scheduler named "cron-pubsub-trigger" which publishes to the topic "cron-topic" every minute.
gcloud scheduler jobs create pubsub cron-pubsub-trigger --schedule="* * * * *" --topic=cron-topic
# check the scheduler has been created
gcloud scheduler jobs list
cloud-functions/functions.go calls Twitter Recent API and retrieves data for a minute (from 2 minutes ago ~ 1 minute ago). The query set in the function is "corona" and the number of maximum result is "10". You can modify "searchingQuery" and "maxResults" variables to change the behavior. You can also add environment variables to dynamically change those settings.
There are several environment variables you should set in this function.
BEARER_TOKEN
GCP_PROJECT
TOPIC_ID
You can create Cloud Functions as follows.
# create a topic named "twitter-data-topic" which the data will be sent to
gcloud pubsub topics create twitter-data-topic
# create a function named "twitter-data-ingestion"
gcloud functions deploy twitter-data-ingestion --source=cloud-functions/functions.go --runtime=go113
# you may check the function has been created
gcloud functions list
You could run a Dataflow pipeline in your local machine or in GCP. Before running the pipeline, you should create BigQuery table. The table contains integer column "id" (Twitter post ID) and "prob" (probability of being positive).
A BigQuery table can be created using bq command line tool.
# create dataset "twitter_data_set" in project "myproject"
bq mk -d \
--description "Twitter data" \
myproject:twitter_data_set
# create table "output" in "twitter_data_set" in project "myproject"
bq mk --table \
--description "Output table" \
myproject:twitter_data_set.output \
id:INTEGER,prob:FLOAT
Dataflow pipeline needs some arguments.
You can run Dataflow locally as follows. Note that Python 3.8.5 (or 3.8.x) should be installed in your machine.
# install Python libraries
pip install apache-beam[gcp]==2.25.0 \
tensorflow==2.3.1 \
tweet-preprocessor==0.6.0
# run Dataflow locally
python dataflow/predict.py \
--project myproject \
--model_dir gs://some-bucket/train/model --input_topic twitter-data-topic \
--output_bigquery_table myproject:twitter_data_set.output
Also Dataflow pipeline can be run in the cloud.
# create Dataflow job in GCP (using region asia-northeast1; you can modify --region argument to change the region)
python dataflow/predict.py --runner DataflowRunner \
--project myproject \
--model_dir gs://some-bucket/train/model --input_topic twitter-data-topic \
--output_bigquery_table myproject:twitter_data_set.output \
--region asia-northeast1 \
--setup_file dataflow/setup.py
Just run your scheduler to start streaming.
# run Cloud Scheduler named "cron-pubsub-trigger"
gcloud scheduler jobs run cron-pubsub-trigger
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Security News
Research
Socket researchers have discovered malicious npm packages targeting crypto developers, stealing credentials and wallet data using spyware delivered through typosquats of popular cryptographic libraries.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.