Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Readme
[|Author|] |PyPi| [|Python 3.7|] |license|
Python Twitter data collector built on
Tweepy <https://github.com/tweepy/tweepy>
__ that allow users to
dynamically collect accounts and statuses from Twitter during streaming,
and automatically generate Datasets from collected data that you can as
CSV.
This library provides a framework that you can use to build your own data collector, specifying which are your features that have to be extracted from Twitter accounts/statuses.
Creating your Twitter dataset: 1. Instantiate an AccountCollector
and/or StatusCollector
class in according to what you want collect,
accounts, statuses or both. At this step you can re-defined your own
features that have to be extracted from twitter data, you have to pass
dict-like parameters in the following form: <feature_name, function>
where the function has to be applied to the user or status object.
Please refer to
documentation <https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/user-object.html>
__
for more details about Twitter objects 2. Instantiate the
OnlineStreamer
passing the collector as parameter 3. Start streaming
on some topics 4. Save the created dataset at specified location
NB: It is not mandatory to use both collectors and streamer but you can also use Collectors alone, for instance if you already have the users and/or statuses to collect and you don’t need to stream anything.
NEW FEATURES: * Offline collection by name, allow user to make a query by name and collect some name-similar users extracting features defined in the collector constructor
The package is available on PyPi
here <https://pypi.org/project/ptdc/>
__
.. code:: bash
$ pip install ptdc
Import modules
::
from ptdc import authenticate, AccountCollector, OnlineStreamer, StatusCollector
Define tokens
~~~~~~~~~~~~~
Replace the following tokens with yours, see Twitter developers
`authentication <https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html>`__
for more details about how obtain them.
::
consumer_key = "xxxxxxxxxxx"
consumer_key_secret = "xxxxxxxxxxxxx"
access_token = "xxxxxxxxxxxxxxxxxxxxxx"
access_token_secret = "xxxxxxxxxxxxxxxxxx"
Create the default Tweepy API object of tweepy
::
api = authenticate(consumer_key=consumer_key, consumer_key_secret=consumer_key_secret, access_token=access_token, access_token_secret=access_token_secret)
Create your own Collectors for collecting data
Create your own StatusCollector object
::
s_collector = StatusCollector(api=api)
Create your own AccountCollector object, which will collect also
statuses
::
collector = AccountCollector(api=api, statuses_collector=s_collector)
Create the Streamer
~~~~~~~~~~~~~~~~~~~
Create Online Streamer that will collect data (in this case will collect
only 5 accounts)
::
streamer = OnlineStreamer(api=api, collector=collector, data_limit=5, n_statuses=400)
Start streaming
~~~~~~~~~~~~~~~
You can start streaming in all ways defined by Tweepy, see its doc for
more details
::
streamer.stream(track=['famous', 'web', 'vip', 'holiday', 'pic', 'photo'], is_async=False)
Save dataset/s
~~~~~~~~~~~~~~
After streaming ended (in according to your defined limits), save
DataFrame/s generated into csv file/s. You just need to access the
collector object and call the save_dataset method providing the path.
::
streamer.collector.save_dataset(path="../data/accounts.csv")
Questions and Contributing
--------------------------
Feel free to post questions and problems on the issue tracker. Pull
requests are welcome!
Feel free to fork and modify or add new features and functionality to
the library
.. |Author| image:: http://img.shields.io/badge/author-lampajr-blue.svg?style=flat-square
.. |PyPi| image:: https://img.shields.io/pypi/v/ptdc.svg?style=flat-square
:target: https://pypi.org/project/ptdc/
.. |Python 3.7| image:: https://img.shields.io/pypi/pyversions/ptdc.svg
.. |license| image:: https://img.shields.io/github/license/mashape/apistatus.svg?style=flat-square
:target: https://github.com/lampajr/PTDC/blob/master/LICENSE
FAQs
Twitter data collection library
We found that ptdc demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.