Richard Wen
rrwen.dev@gmail.com
Command line tool for extracting Twitter data to PostgreSQL databases
Install
- Install Node.js
- Install twitter2pg-cli via
npm
npm install -g twitter2pg-cli
For the latest developer version, see Developer Install.
Usage
Get help:
twitter2pg --help
Open documentation in web browser:
twitter2pg doc twitter2pg
twitter2pg doc twitter
twitter2pg doc pg
See twitter2pg for programmatic usage.
Environment File
Create a template .env
file for Twitter and PostgreSQL details:
twitter2pg file path/to/.env
Set default for the .env
file:
- Every
twitter2pg
command will now use the designated .env
file
twitter2pg set file path/to/.env
PostgreSQL Query
Send a query to a PostgreSQL database after defining and setting the default Environment File.
The usage examples require a table named twitter_data
which can be created with the command below:
twitter2pg query "CREATE TABLE twitter_data(tweets jsonb);"
row | tweets |
---|
1 | {...} |
2 | {...} |
3 | {...} |
... | ... |
REST API
Setup default twitter options:
- Set Twitter REST method (one of
get
, post
, delete
or stream
) - Set Twitter path
- Set Twitter parameters for path
twitter2pg set twitter.method get
twitter2pg set twitter.path search/tweets
twitter2pg set twitter.params "{\"q\":\"twitter\"}"
Setup default PostgreSQL options:
- Set table to store received Twitter data
- Set column to store received Twitter data
- Set insert query for received Twitter data
- Set jsonata filter before inserting
twitter2pg set pg.table twitter_data
twitter2pg set pg.column tweets
twitter2pg set pg.query "INSERT INTO $options.pg.table($options.pg.column) SELECT * FROM json_array_elements($1);"
twitter2pg set jsonata statuses
Extract Twitter data into PostgreSQL table given setup options:
twitter2pg > log.csv
Stream API
Setup default twitter options:
- Set Twitter stream method
- Set Twitter path
- Set Twitter stream parameters
twitter2pg set twitter.method stream
twitter2pg set twitter.path statuses/filter
twitter2pg set twitter.params "{\"track\":\"twitter\"}"
Setup default PostgreSQL options:
- Set table to store streamed Twitter data
- Set column to store streamed Twitter data
- Set insert query for streamed Twitter data
- Set jsonata filter before inserting
twitter2pg set pg.table twitter_data
twitter2pg set pg.column tweets
twitter2pg set pg.query "INSERT INTO $options.pg.table($options.pg.column) VALUES($1);"
twitter2pg set jsonata statuses
Stream Twitter data into PostgreSQL table given setup options:
twitter2pg > log.csv
Stream Twitter data into a PostgreSQL table as a service:
- Save a node runnable script of the current options
- Install pm2 (
npm install pm2 -g
) - Use
pm2
to run the saved script as a service
twitter2pg save path/to/script.js
pm2 start path/to/script.js
pm2 save
Logs
The logs are in the following Comma-Separated Values (CSV) format:
time_iso8601
: Time and date in ISO 8601 formatstatus
: Status of the logmessage
: Relevant messagesjson
: JSON object containing relevant debugging information
time_iso8601 | status | message | json |
---|
... | ... | ... | ... |
Contributions
Report Contributions
Reports for issues and suggestions can be made using the issue submission interface.
When possible, ensure that your submission is:
- Descriptive: has informative title, explanations, and screenshots
- Specific: has details of environment (such as operating system and hardware) and software used
- Reproducible: has steps, code, and examples to reproduce the issue
Code Contributions
Code contributions are submitted via pull requests:
- Ensure that you pass the Tests
- Create a new pull request
- Provide an explanation of the changes
A template of the code contribution explanation is provided below:
## Purpose
The purpose can mention goals that include fixes to bugs, addition of features, and other improvements, etc.
## Description
The description is a short summary of the changes made such as improved speeds or features, and implementation details.
## Changes
The changes are a list of general edits made to the files and their respective components.
* `file_path1`:
* `function_module_etc`: changed loop to map
* `function_module_etc`: changed variable value
* `file_path2`:
* `function_module_etc`: changed loop to map
* `function_module_etc`: changed variable value
## Notes
The notes provide any additional text that do not fit into the above sections.
For more information, see Developer Install and Implementation.
Developer Notes
Developer Install
Install the latest developer version with npm
from github:
npm install -g git+https://github.com/rrwen/twitter2pg-cli
Install from git
cloned source:
- Ensure git is installed
- Clone into current path
- Install via
npm
git clone https://github.com/rrwen/twitter2pg-cli
cd twitter2pg-cli
npm -g install
Tests
- Clone into current path
git clone https://github.com/rrwen/twitter2pg-cli
- Enter into folder
cd twitter2pg-cli
- Ensure devDependencies are installed and available
- Run tests with a
.env
file (see tests/README.md) - Results are saved to tests/log with each file corresponding to a version tested
npm install
npm test
Upload to Github
- Ensure git is installed
- Inside the
twitter2pg-cli
folder, add all files and commit changes - Push to github
git add .
git commit -a -m "Generic update"
git push
Upload to npm
- Update the version in
package.json
- Run tests and check for OK status
- Login to npm
- Publish to npm
npm test
npm login
npm publish
Implementation
The module twitter2pg uses the following npm packages for its implementation:
npm | Purpose |
---|
yargs | Command line builder and parser |
twitter2pg | Extracts Twitter data to PostgreSQL |
dotenv | Load environmental variables from a file |
opn | Open online browser documentation |
pg | Send queries to PostgreSQL database |
yargs
|--- twitter2pg <-- default command
|--- dotenv <-- file
|--- opn <-- doc
|--- pg <-- query