![require(esm) Backported to Node.js 20, Paving the Way for ESM-Only Packages](https://cdn.sanity.io/images/cgdhsj6q/production/be8ab80c8efa5907bc341c6fefe9aa20d239d890-1600x1097.png?w=400&fit=max&auto=format)
Security News
require(esm) Backported to Node.js 20, Paving the Way for ESM-Only Packages
require(esm) backported to Node.js 20, easing the transition to ESM-only packages and reducing complexity for developers as Node 18 nears end-of-life.
This is wrapper around publicly/privately available Reddit RSS feeds. It can be used to fetch content from front page, subreddit, all comments of subreddit, all comments of a certain post, comments of certain reddit user, search pages and many more. For more details about what type of RSS feed is provided by Reddit refer these links: link1 and link2.
*Note: These feeds are rate limited hence can only be used for testing purpose. For serious scrapping register your bot at apps to get client details and use it with Praw.
Install via PyPi:
pip install reddit-rss-reader
Install from master branch (if you want to try the latest features):
git clone https://github.com/lalitpagaria/reddit-rss-reader
cd reddit-rss-reader
pip install --editable .
RedditRSSReader
require feed url, hence refer link to generate. For example to fetch all comments on subreddit r/wallstreetbets
-
https://www.reddit.com/r/wallstreetbets/comments/.rss?sort=new
Now you can run the following example -
import pprint
from datetime import datetime, timedelta
import pytz as pytz
from reddit_rss_reader.reader import RedditRSSReader
reader = RedditRSSReader(
url="https://www.reddit.com/r/wallstreetbets/comments/.rss?sort=new"
)
# To consider comments entered in past 5 days only
since_time = datetime.utcnow().astimezone(pytz.utc) + timedelta(days=-5)
# fetch_content will fetch all contents if no parameters are passed.
# If `after` is passed then it will fetch contents after this date
# If `since_id` is passed then it will fetch contents after this id
reviews = reader.fetch_content(
after=since_time
)
pp = pprint.PrettyPrinter(indent=4)
for review in reviews:
pp.pprint(review.__dict__)
Reader return RedditContent
which have following information (extracted_text
and image_alt_text
are extracted from Reddit content via BeautifulSoup
) -
@dataclass
class RedditContent:
title: str
link: int
id: str
content: str
extracted_text: Optional[str]
image_alt_text: Optional[str]
updated: datetime
author_uri: str
author_name: str
category: str
The output is given with UTF-8 charsets, if you are scraping non-english reddits then set the environment to use UTF -
export LANG=en_US.UTF-8
export PYTHONIOENCODING=utf-8
FAQs
A Wrapper around Reddit RSS feed
We found that reddit-rss-reader demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
require(esm) backported to Node.js 20, easing the transition to ESM-only packages and reducing complexity for developers as Node 18 nears end-of-life.
Security News
PyPI now supports iOS and Android wheels, making it easier for Python developers to distribute mobile packages.
Security News
Create React App is officially deprecated due to React 19 issues and lack of maintenance—developers should switch to Vite or other modern alternatives.