A tool to generate an RSS feed from some WordPress blogs and other sources that for some reason don't generate their own feeds. This tool uses BeautifulSoup to parse the HTML and feedgen to generate the feed. I created this tool to be to follow news from companies that have forgotten the usefulness of RSS.
Installation
Create a virtual environment and simply run python3 -m pip install rssfixer
, full example below.
python3 -m venv venv
source venv/bin/activate
python3 -m pip install rssfixer
Supported page types
I've expanded the tools to blogs that I like to follow. If you have suggestions to add/change functionality please open an issue or start a new discussion.
The basic formats of web pages supported are:
--list
- links are in simple ul-list--json
- links, titles and sometimes description is accessible in a JSON structure--html
- links and titles can be found by some unique HTML element--release
- similar to --html
except there are no links and you have to specify a target URL
During testing it is useful to use --stdout
option to see the generated feed. When I have time (and enough motivation) I might write a tool to try and find the right combination of options for a specified URL.
Simple list
An example to generate a feed for nccgroup that have the links in a simple ul-list by using the --list
option:
$ rssfixer --title nccgroup --list https://research.nccgroup.com/
RSS feed created: rss_feed.xml
You can specify a filename and silence output:
rssfixer --title nccgroup --output nccgroup.xml --quiet https://research.nccgroup.com/
The resulting file is available here as an example.
Most times you would run the script from crontab to have an updated feed. Here is an example with a venv in /home/user/src/rssfixer.
32 * * * * /home/user/src/rssfixer/bin/rssfixer --title nccgroup --output /var/www/html/feeds/nccgroup.xml --quiet --list https://research.nccgroup.com
JSON
Some blogs like truesec.com have all blog links in a JSON object. You can use the --json
option to parse the JSON object and generate a feed. The same is true for Apple's security blog page.
An example for Apple:
rssfixer --title "Apple Security" --output apple.xml --quiet --json --json-entries blogs --json-url slug --base-url https://security.apple.com/blog/ https://security.apple.com/blog
In this example --json-entries blogs
specifies that blog entries are located in a key called blogs and that URLs are available in a key called slug. Since the URL only includes the key (or slug) we specify the full URL to the blog with --base-url https://security.apple.com/blog/
.
An example for truesec.com:
rssfixer --title Truesec --json --json-description preamble --quiet --output truesec.xml https://www.truesec.com/hub/blog
Here we must specify --json-description preamble
to find the description or summary of the blog post.
General HTML
Pages with a more general HTML structure can be parsed with the --html
option. You can specify the HTML tag for the entries, the URL and title of the blog entry.
An example for tripwire.com:
rssfixer --title Tripwire --output tripwire.xml --quiet --html --base-url https://www.tripwire.com http://www.tripwire.com/state-of-security
Release
Check for one entity on release pages like SQLite (h3) and generate RSS feed with links to the download page (required argument --release-url
). Easy way to get notified when a new version is released.
rssfixer --release --output sqlite.xml --release-entries h3 --release-url https://sqlite.org/download.html https://sqlite.org/changes.html
Usage
Command-line options (updated on commit by markdown-code-runner):
usage: rssfixer [-h] (--html | --json | --list | --release) [--version]
[--atom] [--base-url BASE_URL] [--release-url RELEASE_URL]
[--release-entries RELEASE_ENTRIES]
[--html-entries HTML_ENTRIES]
[--html-entries-class HTML_ENTRIES_CLASS]
[--html-url HTML_URL] [--html-title HTML_TITLE]
[--html-title-class HTML_TITLE_CLASS]
[--title-filter TITLE_FILTER]
[--html-description HTML_DESCRIPTION]
[--html-description-class HTML_DESCRIPTION_CLASS]
[--json-entries JSON_ENTRIES] [--json-url JSON_URL]
[--json-title JSON_TITLE]
[--json-description JSON_DESCRIPTION] [--output OUTPUT]
[--title TITLE] [--user-agent USER_AGENT]
[--filter-type FILTER_TYPE] [--filter-name FILTER_NAME] [-q]
[-d] [--stdout]
url
Generate RSS feed for blog that don't publish a feed. Default is to find links
in a simple <ul>-list. Options are available to find links in other HTML
elements or JSON strings.
positional arguments:
url URL for the blog
options:
-h, --help show this help message and exit
--html Find entries in HTML
--json Find entries in JSON
--list Find entries in HTML <ul>-list (default)
--release Find releases in HTML
--version show program's version number and exit
--atom Generate Atom feed
--base-url BASE_URL Base URL for the blog
--release-url RELEASE_URL
Release URL for downloads
--release-entries RELEASE_ENTRIES
Release selector for entries
--html-entries HTML_ENTRIES
HTML selector for entries
--html-entries-class HTML_ENTRIES_CLASS
Class name for entries
--html-url HTML_URL HTML selector for URL
--html-title HTML_TITLE
HTML selector for title
--html-title-class HTML_TITLE_CLASS
Flag to specify title class (regex)
--title-filter TITLE_FILTER
Filter for title, ignore entries that don't match
--html-description HTML_DESCRIPTION
HTML selector for description
--html-description-class HTML_DESCRIPTION_CLASS
Flag to specify description class (regex)
--json-entries JSON_ENTRIES
JSON key for entries (default: 'entries')
--json-url JSON_URL JSON key for URL (default: 'url')
--json-title JSON_TITLE
JSON key for title
--json-description JSON_DESCRIPTION
JSON key for description
--output OUTPUT Name of the output file
--title TITLE Title of the RSS feed (default: "My RSS Feed")
--user-agent USER_AGENT
User agent to use for HTTP requests
--filter-type FILTER_TYPE
Filter web page
--filter-name FILTER_NAME
Filter web page
-q, --quiet Suppress output
-d, --debug Debug selection
--stdout Print to stdout
Command-line examples for blogs
rssfixer --title "Apple Security" --output apple.xml --quiet --json --json-entries blogs --json-url slug --base-url https://security.apple.com/blog/ https://security.apple.com/blog
rssfixer --title nccgroup --output nccgroup.xml --quiet --list https://research.nccgroup.com
rssfixer --title Tripwire --output tripwire.xml --quiet --html --base-url https://www.tripwire.com http://www.tripwire.com/state-of-security
rssfixer --title Truesec --output truesec.xml --quiet --json --json-description preamble https://www.truesec.com/hub/blog
rssfixer --title SQLite --release --release-entries h3 --release-url https://sqlite.org/download.html https://sqlite.org/changes.html
rssfixer --title "Nucleus CISA KEV" --output nucleus.xml --html --filter-type div --filter-name recent-post-widget --html-entries div --html-title div --html-title-class "post-desc" --title-filter KEV https://nucleussec.com/category/cisa-kev
rssfixer --html --filter-type div --filter-name 'page-container' --html-entries div --html-entries-class "news-text" --html-title h2 --html-title-class "" --html-description p --html-description-class "" --html-url a --base-url https://www.ncsc.se --stdout --atom --title "Feed for NCSC-SE" https://www.ncsc.se/publikationer/
If you have other example use case please add them in show usage examples in discussions.