Security News
38% of CISOs Fear They’re Not Moving Fast Enough on AI
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Tools for accessing the Twitter API v1.1 with paranoid timeouts and de-pagination.
Node.js
twitter-curl
for querying the streaming API (/sample.json
and /filter.json
).rtcount
for pulling out the retweets from a stream of JSON tweets, and counting them.Python
json2ttv2
converts directories full of Twitter .json
files into .ttv2
files,
bzip2'ing them, and ensuring that the result is within a reasonable size of the source (greater than 2%, but less than 6%) before deleting the original json.twitter-user
pulls down the ~3,200 (max) tweets that are accessible for a given user
(also depends on the ~/.twitter
auth file).Install from npm
:
npm install -g twilight
Or github (to make sure you're getting the most up-to-date version):
npm install -g git://github.com/chbrown/twilight
This app uses only OAuth 1.0A, which is mandatory. As of June 11, 2013, basic HTTP authentication is disabled in the Twitter Streaming API. So get some OAuth credentials together real quick and make a csv file that looks like this:
consumer_key | consumer_secret | access_token | access_token_secret |
---|---|---|---|
ziurk0An7... | VKmTsGrk2JjH... | 91505165... | VcLOIzA0mkiCSbU... |
63Yp9EG4t... | DhrlIQBMUaoL... | 91401882... | XJa4HQKMgqfd7ee... |
... | ... | ... | ... |
There must be a header line with exactly the following values:
Tab / space seperated is fine, and any other columns will simply be ignored, e.g., if you want to record the screen_name
of each account. Also, order doesn't matter -- your headers just have to line up with their values.
The twitter-curl
script expects to find this file at ~/.twitter
,
but you can specify a different path with the --accounts
command line argument.
/etc/supervisor/conf.d/*
(See http://supervisord.org/ to get that all set up.)
[program:justinnnnnn]
user=chbrown
command=twitter-curl
--filter "track=loveyabiebs,belieber,bietastrophe"
--file /data/twitter/justin_TIMESTAMP.json
--timeout 86400
--interval 3600
--ttv2
twitter-curl
options--accounts
should point to a file with OAuth Twitter account credentials.
Currently, the script will simply use a random row from this file.--filter
can be any track=whatever
or locations=-18,14,68,44
etc. A
querystring
-parsable string. If no filter is specified, it will use the
spritzer at /sample.json
--file
shouldn't require creating any directions, and the TIMESTAMP bit
will be replaced by a filesystem-friendly iso representation of whenever
the program is started.// Specifically:
var stamp = new Date().toISOString().replace(/:/g, '-').replace(/\..+/, '');
// stamp == '2013-06-07T15-47-49'
--timeout
(seconds) the program will die with error code 1 after this
amount of time. Don't specify a timeout if you don't want this.--interval
(seconds) the amount of time to allow for silence from Twitter
before dying. Also exits with code 1. Defaults to 600 (10 minutes).--ttv2
(boolean) output TTV2 normalized flat tweets instead of full JSON.Because in most cases of error the script simply dies, this approach only really makes sense if you're putting it behind some process monitor. (By the way, I've tried most of them: monitd, daemontools's svc, god, bluepill, node-forever---and supervisord is by far the best.)
The script does not abide by any Twitter backoff requirement, but I've never
had any trouble with backoff being enforced by Twitter. It's more than curl,
though, because it checks that it's receiving data. Often, with curl
and
PycURL
, my connection would be dropped by Twitter, but no end signal would be sent.
My crawler would simply hang, expecting data, but would not try to reconnect.
But beyond that, without --ttv2
, it doesn't provide anything more than curl
.
TTV2 is the Tweet tab-separated format version 2, the specification is below. Fields are 1-indexed for easy AWKing (see Markdown source for 0-indexing).
This format is not the default, and will be the output only when you use the --ttv2
option.
Install json first: npm install json
. It's awesome.
twilight stream --filter 'track=bootstrap' | jq -r -c .text
twilight stream --filter 'track=bootstrap' | jq -c '{user.screen_name, text}'
twilight stream --filter 'track=انتخابات' | jq -r -c .text
twilight stream --filter 'track=sarcmark,%F0%9F%91%8F' | jq -r -c .text
It supports unicode: انتخابات is Arabic for "elections," and decodeURIComponent('%F0%9F%91%8F')
is the "CLAPPING HANDS" (U+1F44F) character.
If you use a filter with url-escaped characters in supervisord, note that supervisord Python-interpolates strings, so you'll need to escape the percent signs, e.g.:
[program:slowclap]
command=twitter-curl --filter "track=%%F0%%9F%%91%%8F" --file /tmp/slowclap.json
Instead of JSON, you can use AWK to look at the TTV2:
twitter-curl --filter 'track=data,science' --ttv2 | awk 'BEGIN{FS="\t"}{print $4,$3}'
Copyright 2011-2015 Christopher Brown. MIT Licensed.
FAQs
Twitter API tools
We found that twilight demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.
Security News
Company News
Socket is joining TC54 to help develop standards for software supply chain security, contributing to the evolution of SBOMs, CycloneDX, and Package URL specifications.