Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

twclient

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

twclient

A high-level analytics-focused client for the Twitter API

0.2.0
PyPI

Maintainers: 1

.. image:: https://img.shields.io/badge/License-Apache_2.0-blue.svg :target: https://www.apache.org/licenses/LICENSE-2.0 :alt: Apache 2.0 License

.. image:: https://badge.fury.io/py/twclient.svg :target: https://pypi.python.org/pypi/twclient/ :alt: PyPi version

.. image:: https://img.shields.io/pypi/pyversions/twclient.svg :target: https://pypi.python.org/pypi/twclient/ :alt: Python versions

.. .. image:: https://readthedocs.org/projects/twclient/badge/?version=latest :target: http://twclient.readthedocs.io/?badge=latest :alt: Documentation Status

twclient

This package provides a high-level command-line client for the Twitter API, with a focus on loading data into a database for analysis or bulk use.

Documentation: mit-ccc.github.io/twclient <https://mit-ccc.github.io/twclient>__

Why use this project?

This project offers high-level primitives for researchers who want to get data out of Twitter, without worrying about the API details. The client can handle multiple sets of API credentials seamlessly, helping avoid rate limit issues. [1]_ There's also support for exporting bulk datasets from the fetched raw data.

Installation

Install the package from pypi:

.. code-block:: bash

pip3 install twclient

or, if you want to use the development version, clone this repo and install:

.. code-block:: bash

git clone git@github.com:mit-ccc/twclient.git && cd twclient pip3 install .

You can also use the -e flag to install in editable mode:

.. code-block:: bash

pip3 install -e .

To install all development dependencies, replace . with .[dev] in the arguments to pip3 install.

Usage

First, you need to tell twclient about your database backend and Twitter credentials. On the database side, we've only tested with Postgres and SQLite. While the package may well work with other DB engines, be aware that you may encounter issues.

Setup: Database

The database backend can be either sqlite or an arbitrary database specified by a sqlalchemy connection string.

You can set up the database in one of two ways. Both create a persistent profile in your .twclientrc file (or whatever other file you specify), so there's no need to type the database details repeatedly.

First, you can specify the DB with a sqlalchemy connection URL:

.. code-block:: bash

Postgres -- this becomes the default DB because you've created it first

twclient config add-db -u "postgresql+psycopg2://username@hostname:5432/dbname" my_postgres_db

Or you could use SQLite

twclient config add-db -u "sqlite:///home/user/twitter.db" my_sqlite_db

There's also support for using SQLite without having to think about sqlalchemy and connection URLs:

.. code-block:: bash

twclient config add-db -f ./twitter2.db my_sqlite_db2

If you specify a file-backed sqlite DB, as in the examples above, it'll be created if it doesn't exist. Other databases (Postgres, for example) will need to be set up separately.

Finally, you have to install our database schema into your database to receive Twitter data:

.. code-block:: bash

You have to specify the -y to say you're aware all data will be dropped

twclient initialize -d my_postgres_db -y

Be aware that doing this will DROP ALL EXISTING TWCLIENT DATA!!! (Or other tables with the same names.) If you're not just getting started, check to make sure you're using a new or empty database, don't care about the contents, and/or have backups before running this.

Setup: Twitter

You'll also need to set up your Twitter API credentials. [1]_ As with the database setup, doing this stores the credentials in a config file (the same config file as for database info) for ease of use. Only two sets of credentials are shown, but you can add as many as you want.

Here's an example of adding two API keys:

.. code-block:: bash

twclient config add-api -n twitter1
--consumer-key XXXXX
--consumer-secret XXXXXX
--token XXXXXX
--token-secret XXXXXX

twclient config add-api -n twitter2
--consumer-key XXXXX
--consumer-secret XXXXXX
--token XXXXXX
--token-secret XXXXXX

Here's an example of adding credentials that use app-only auth <https://developer.twitter.com/en/docs/authentication/oauth-2-0/application-only>_:

.. code-block:: bash

twclient config add-api -n twitter3
--consumer-key XXXXX
--consumer-secret XXXXXX

Pulling data

To actually pull data, use the twclient fetch command. We'll pull information about three specific users and a Twitter list here. Note that you can refer to lists either by their "slug" (username/listname) or by the ID at the end of a URL of the form https://twitter.com/i/lists/53603015.

First, let's load some users and their basic info:

.. code-block:: bash

you could instead also end this with "-l 53603015"; it's the same list

twclient fetch users -n wwbrannon CCCatMIT MIT -l MIT/peers1

Now, to save typing, let's use the twclient tag command to apply a tag we can use to keep track of these users later:

.. code-block:: bash

twclient tag create subjects twclient tag apply subjects -n wwbrannon CCCatMIT MIT -l MIT/peers1

We can now use this tag in specifying users, such as which users we'd like to fetch tweets for:

.. code-block:: bash

twclient fetch tweets -g subjects

And if we also want their follow-graph info (note that a "friend" is Twitter's term for a follow-ee, an account you follow):

.. code-block:: bash

twclient fetch friends -g subjects twclient fetch followers -g subjects

At this point, the loaded data is in the database configured with config add-db. Useful features have been normalized out to save processing time. The raw API responses are also saved for later analysis.

Exporting data

You can query the data with the usual database tools (psql for postgres, sqlite3 for sqlite, ODBC clients, etc.) or export certain pre-defined bulk datasets with the twclient export command. For example, here are the follow graph and mention graph over users:

.. code-block:: bash

twclient export follow-graph -o follow-graph.csv
twclient export mention-graph -o mention-graph.csv

If you want to restrict the export to only the users specified above:

.. code-block:: bash

twclient export follow-graph -g subjects -o follow-graph.csv
twclient export mention-graph -g subjects -o mention-graph.csv

For other exports and other options, see the documentation.

Feedback or Contributions

If you come across a bug, please report it on the Github issue tracker. If you want to contribute, reach out! Extensions and improvements are welcome.

Copyright

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

.. [1] Of course, you'll need to make sure you have the right to use all of your credentials and are complying with Twitter's terms of use.

Changelog

All notable changes to this project will be documented here.

The format is based on Keep a Changelog <https://keepachangelog.com/en/1.0.0/>, and this project follows Semantic Versioning <https://semver.org/spec/v2.0.0.html>.

v0.2.0

Added


New features:

*  First stable version.
*  API documentation.
*  A command-line interface for easy automated fetching of data.
*  A stable relational data model, to make further analysis or data processing
   independent of the details of data ingest.
*  Support for fetching follow-graph edges, tweets, lists and user info.
*  Support for a variety of database backends via sqlalchemy.
*  Type-2 SCD for tracking the follow graph and list membership over time.
*  Many sanity checks in the data model to ensure correctness of loaded data.
*  Loaded data is extensively normalized: mentions, replies, retweets,
   and entities like hashtags are extracted into first-class objects for
   more convenient and accessible analysis.
*  Users can be tagged into arbitrary groups for greater convenience in
   analysis or data collection.
*  Support for both app-only and user authentication to the Twitter API.

Keywords

FAQs

What is twclient?

Is twclient well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

twclient

twclient

Why use this project?

Installation

Usage

Setup: Database

Postgres -- this becomes the default DB because you've created it first

Or you could use SQLite

You have to specify the -y to say you're aware all data will be dropped

Setup: Twitter

Pulling data

you could instead also end this with "-l 53603015"; it's the same list

Exporting data

Feedback or Contributions

Copyright

Changelog

v0.2.0

Keywords

Related posts

Threat Actor Exposes Playbook for Exploiting npm to Build Blockchain-Powered Botnets

NVD Backlog Tops 20,000 CVEs Awaiting Analysis as NIST Prepares System Updates