Security News
PyPI’s New Archival Feature Closes a Major Security Gap
PyPI now allows maintainers to archive projects, improving security and helping users make informed decisions about their dependencies.
A tool for exporting tweet data from Twitter by parsing GraphQL fetch requests made by the Twitter website.
Photo by Martin Vorel, libreshot.com
Problem: You were running some kind of project that used Twitter API to load tweets from some number of feeds and process them in some way - for archiving, research, statistics, whatever. Now the free API access has been shut down, all your API keys have been revoked and your project doesn't work anymore ☹️
Solution 1: sign up for paid access and pay more than all your streaming, media, internet, mobile and app subscriptions combined every month just to fetch some tweets 🤑💰💰💰
Solution 2: go the Chad Scraper route and scrape the data from the website with some scripts, playing a cat and mouse game and worrying that your account and/or IP will be blocked 😬
Solution 3: passively record the requests that the Twitter frontend is making to the API using Safari Web Inspector, then use some Ruby code to extract any data you want from the saved JSON responses 🤔
Note: one obvious drawback of this method is that the request recording part is somewhat manual, so it's (probably) not possible to completely automate it so that it runs on a server somewhere, unattended. However, it should be enough if you're ok with having to remember to periodically browse through a few timelines, save the export and run a script on it.
This is a very early version of this tool. The API *will* change between versions, possibly even between point releases. Don't be surprised if something breaks.
To install the tool, run:
gem install bad_pigeon
The TweetExtractor
class is the entry point. Pass the contents of the .har
file to the #get_tweets_from_har
method to get an array of Tweet
objects parsed from the whole archive:
require 'bad_pigeon'
data = File.read(path_to_har)
extractor = BadPigeon::TweetExtractor.new
tweets = extractor.get_tweets_from_har(data)
tweets.sort_by(&:created_at).reverse.each do |tweet|
puts "#{tweet.created_at} @#{tweet.user.screen_name}: \"#{tweet.text}\""
end
The Tweet
class is meant to be API compatible with the one from the popular twitter gem, so you should be able to use it as a drop-in replacement if your project used that library (although only some subset of properties will work right now - please report issues for any missing ones).
The gem also installs a command-line script pigeon
. You can pass it the archive file and get a JSON array of tweet data on the output:
pigeon < tweets.har > tweets.json
At the moment this is the only thing it does. There will be some options in the future to e.g. filter the tweets only from some sources and so on. The format that it exports the tweets in is also meant to match the hashes returned from the #attrs
method in the Tweet
class in the twitter gem.
Copyright © 2023 Kuba Suder (@mackuba.eu).
The code is available under the terms of the zlib license (permissive, similar to MIT).
Bug reports and pull requests are welcome 😎 (note: if you're having problems parsing some tweets, please send me links to some examples of specific tweets that are making it fail).
Because pigeons are generally bad :<
FAQs
Unknown package
We found that bad_pigeon demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
PyPI now allows maintainers to archive projects, improving security and helping users make informed decisions about their dependencies.
Research
Security News
Malicious npm package postcss-optimizer delivers BeaverTail malware, targeting developer systems; similarities to past campaigns suggest a North Korean connection.
Security News
CISA's KEV data is now on GitHub, offering easier access, API integration, commit history tracking, and automated updates for security teams and researchers.