post-feed-reader
A library to fetch news, blog or podcast posts from any site.
It works by auto-discovering a post source, which can be an RSS/Atom/JSON feed or the Wordpress REST API, and then fetches and parses the list of posts.
It's originally meant for NodeJS, but as it is built on Isomorphic Javascript, it can work on browsers if the website allows cross-origin requests.
Originally built for apps that need to list the posts with their own UI, but don't actually manage the blog and need automatic fallbacks when the blog technology does change.
Features
Getting Started
Install it with NPM or Yarn:
npm install post-feed-reader
yarn add post-feed-reader
You first need to discover the post source, which will be a URL to the RSS/Atom/JSON Feed or the Wordpress REST API.
Then you can pass the discovered source to the getPostList
, which will fetch and parse it.
import { discoverPostSource, getPostList } from 'post-feed-reader';
const source = await discoverPostSource('https://www.nytimes.com');
const posts = await getPostList(source);
console.log(posts);
Simple enough, eh?
Output
See an example of the result based on the Mozilla blog.
Options
const source = await discoverPostSource('https://techcrunch.com', {
axios: axios.create(...),
preferFeeds: false,
canUseSource: (source: DiscoveredSource) => true,
tryToGuessPaths: false,
wpApiPaths: ['./wp-json', '?rest_route=/'],
feedPaths: ['./feed', './atom', './rss', './feed.json', './feed.xml', '?feed=atom'],
});
const posts = await getPostList(source, {
axios: axios.create(...),
fillTextContents: false,
wordpress: {
includeEmbedded: true,
limit: 10,
page: 1,
search: '',
authors: [...],
categories: [...],
tags: [...],
additionalParams: { ... },
},
});
Skip the auto-discovery
If you already have an Atom/RSS/JSON Feed or the Wordpress REST API url in hands, you can fetch the posts directly:
const feedPosts = await getFeedPostList('https://news.google.com/atom');
const wpApiPosts = await getWordpressPostList('https://blog.mozilla.org/en/wp-json/');
RSS is indeed the most widely feed format used on the web, but not only it lacks data, but the specification is a mess with many vague to implementation properties, meaning how the information is formatted differs from feed to feed. For instance, the description can be the full post as HTML, or just an excerpt, or in plain text, or even just a HTML link to the post page.
Atom's specification is way more rigid and robust, which makes relying on the data trustworthier. It's definitely the way to go in the topic of feeds. But it still lacks some properties that can only be fetched through the Wordpress REST API.
The Wordpress REST API also has the following benefits:
- Filtering by category, tag or author
- Searching
- Pagination
- Featured media
- Author profile
The JSON Feed format is also just as good as Atom format, but very few websites produce it.
How does the auto-discovery works?
- Fetches the site's main page
- Looks for Wordpress Link headers
- Looks for RSS, Atom and JSON Feed
<link>
metatags - If
tryToGuessPaths
is set to true
, it will look for the paths to try to find a feed or the WP API.