
Security News
Django Joins curl in Pushing Back on AI Slop Security Reports
Django has updated its security policies to reject AI-generated vulnerability reports that include fabricated or unverifiable content.
mass-downloader-for-bluesky (mdfb) is a Python cli application that can download large amounts of posts from bluesky from any given account.
You will need Python to be installed to use this CLI.
You can install via pip by:
pip install mdfb
Have Poetry installed.
Then clone the project, open a poetry shell and then install all dependencies.
git clone git@github.com:IbrahimHajiAbdi/mass-downloader-for-bluesky.git
cd mdfb
poetry shell
poetry install
mdfb
works by using the public API offered by bluesky to retrieve posts liked, reposted or posted by the desired account.
mdfb
will download the information for a post and the accompanying media, video or image(s). If there is no image(s) or video, it will just download the information of the post. The information of the post will be a JSON file and have lots of accompanying data, such as the text in the post, creation time of the post and author details. Currently, the retrieved posts start from the latest post to the oldest.
You will need to be inside a poetry shell to use mdfb
if installed manually
Some example commands would be:
mdfb download --handle bsky.app -l 10 --like --threads 3 --format "{RKEY}_{HANDLE}" ./media/
mdfb download -d did:plc:z72i7hdynmk6r22z27h6tvur --archive --like --threads 3 --format "{DID}_{HANDLE}" ./media/
mdfb download --handle bsky.app --update --like --threads 3 --format "{RKEY}_{HANDLE}" ./media/
mdfb download --restore bsky.app --like --threads 3 --format "{RKEY}_{HANDLE}" ./media/
By default, mdfb
's naming convention is: "{rkey}_{handle}_{text}"
. If it is downloading a post with multiple images then the naming will be: "{rkey}_{handle}_{text}_{i}"
, where "i" represents the order of the images in the post ranging from 1 - 4. In addition, the filenames are limited to 256 bytes and will be truncated down to that size.
However, you can specify the name of the files by using the --format
flag and passing a valid format string, e.g. "{RKEY}_{DID}"
. You can put anything in the format string inbetween the keywords. This is case-sensitive.
For --format
, the valid keywords are:
RKEY
DID
HANDLE
TEXT
DISPLAY_NAME
When specifying the limit, this will be true for all types of post downloaded. For example:
mdfb download --handle bsky.app -l 100 --like --repost --post ./media/
This would download 100 likes, reposts and post, totalling 300 posts downloaded.
Furthermore, you can archive whole accounts. For exmaple:
mdfb download --handle bsky.app --archive --like --repost --threads 3 --format "{DID}_{HANDLE}" ./media/
This would download all likes and reposts.
When downloading posts, mdfb
inserts into the database some post identifiers. This allows for you to download only new posts from an account that you haven't downloaded yet.
However, there are some constraints, if you delete a file, this is not reflected in the database and thus, if you use the --update
flag, it will not redownload it. Furthermore, the posts identifiers are only committed to the database once all posts have been downloaded, so if mdfb
topples over during downloading, none of the posts downloaded will be reflected into the database.
The database is stored in: (Linux) ~/.local/share/mdfb/
, (Windows) C:\\Users\\$USER\\AppData\\Local\\mdfb
and (macOS) /Users/$USER/Library/Application Support/mdfb
.
mdfb db --delete_user bsky.app
The maximum number of threads is currently 3, that can be changed in the mdfb/utils/constants.py
file. Furthermore, there are more constants that can be changed in that file, such as delay between each request and the number of retires before marking that post as a failure and continuing.
download
--handle
--did, -d
--limit, -l
--archive
--update, -u
directory
--threads, -t
--format, -f
--like
--repost
--post
--media-types
--include, -i
--restore
db
--delete_user
generic commands
--resource, -r
At least one of the flags: --like
, --repost
, --post
are required (when using download
).
Both (--did, -d
and --handle
) and (--archive
, --limit, -l
and --update
) are mutually exclusive, and one of each of them is required as well (when using download
).
The argument --media-types
needs to be either before or after any positional arguments.
E.g.
mdfb download --handle bsky.app --update --like --threads 3 --media-types image --format "{RKEY}_{HANDLE}" ./media/`
Furthermore, if you want to filter by text and image or media and then use --include
by media, this would not include any post filter by text. E.g.
mdfb download --handle bsky.app --update --like --threads 3 --media-types image text -i media ./media/`
This would just download images only.
FAQs
A CLI for downloading posts in bulk from Bluesky from specified a account
We found that mdfb demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Django has updated its security policies to reject AI-generated vulnerability reports that include fabricated or unverifiable content.
Security News
ECMAScript 2025 introduces Iterator Helpers, Set methods, JSON modules, and more in its latest spec update approved by Ecma in June 2025.
Security News
A new Node.js homepage button linking to paid support for EOL versions has sparked a heated discussion among contributors and the wider community.