To read & normalize RSS/ATOM/JSON feed data.
(This library is derived from feed-reader renamed.)
Demo
Install & Usage
Node.js
npm i @extractus/feed-extractor
import { extract } from '@extractus/feed-extractor'
const { extract } = require('@extractus/feed-extractor')
const { extract } = require('@extractus/feed-extractor/dist/cjs/feed-extractor.js')
const result = await extract('https://news.google.com/rss')
console.log(result)
Deno
import { extract } from 'https://esm.sh/@extractus/feed-extractor'
import { extract } from 'npm:@extractus/feed-extractor'
Browser
import { extract } from 'https://unpkg.com/@extractus/feed-extractor@latest/dist/feed-extractor.esm.js'
Please check the examples for reference.
APIs
Note:
- Old method
read()
has been marked as deprecated and will be removed in next major release.
Load and extract feed data from given RSS/ATOM/JSON source. Return a Promise object.
Syntax
extract(String url)
extract(String url, Object parserOptions)
extract(String url, Object parserOptions, Object fetchOptions)
Example:
import { extract } from '@extractus/feed-extractor'
const result = await extract('https://news.google.com/atom')
console.log(result)
Without any options, the result should have the following structure:
{
title: String,
link: String,
description: String,
generator: String,
language: String,
published: ISO Date String,
entries: Array[
{
id: String,
title: String,
link: String,
description: String,
published: ISO Datetime String
},
]
}
Parameters
url
required
URL of a valid feed source
Feed content must be accessible and conform one of the following standards:
parserOptions
optional
Object with all or several of the following properties:
normalization
: Boolean, normalize feed data or keep original. Default true
.useISODateFormat
: Boolean, convert datetime to ISO format. Default true
.descriptionMaxLen
: Number, to truncate description. Default 210
(characters).xmlParserOptions
: Object, used by xml parser, view fast-xml-parser's docsgetExtraFeedFields
: Function, to get more fields from feed datagetExtraEntryFields
: Function, to get more fields from feed entry databaseUrl
: URL string, to absolutify the links within feed content
For example:
import { extract } from '@extractus/feed-extractor'
await extract('https://news.google.com/atom', {
useISODateFormat: false
})
await extract('https://news.google.com/rss', {
useISODateFormat: false,
getExtraFeedFields: (feedData) => {
return {
subtitle: feedData.subtitle || ''
}
},
getExtraEntryFields: (feedEntry) => {
const {
enclosure,
category
} = feedEntry
return {
enclosure: {
url: enclosure['@_url'],
type: enclosure['@_type'],
length: enclosure['@_length']
},
category: isString(category) ? category : {
text: category['@_text'],
domain: category['@_domain']
}
}
}
})
fetchOptions
optional
You can use this param to set request headers to fetch.
For example:
import { extract } from '@extractus/feed-extractor'
const url = 'https://news.google.com/rss'
await extract(url, null, {
headers: {
'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
}
})
You can also specify a proxy endpoint to load remote content, instead of fetching directly.
For example:
import { extract } from '@extractus/feed-extractor'
const url = 'https://news.google.com/rss'
await extract(url, null, {
headers: {
'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
},
proxy: {
target: 'https://your-secret-proxy.io/loadXml?url=',
headers: {
'Proxy-Authorization': 'Bearer YWxhZGRpbjpvcGVuc2VzYW1l...'
}
}
})
Passing requests to proxy is useful while running @extractus/feed-extractor
on browser.
View examples/browser-feed-reader
as reference example.
Extract feed data from JSON string.
Return an object which contains feed data.
Syntax
extractFromJson(String json)
extractFromJson(String json, Object parserOptions)
Example:
import { extractFromJson } from '@extractus/feed-extractor'
const url = 'https://www.jsonfeed.org/feed.json'
const res = await fetch(url)
const json = await res.json()
const feed = extractFromJson(json)
console.log(feed)
Parameters
json
required
JSON string loaded from JSON feed resource.
parserOptions
optional
See parserOptions above.
Extract feed data from XML string.
Return an object which contains feed data.
Syntax
extractFromXml(String xml)
extractFromXml(String xml, Object parserOptions)
Example:
import { extractFromXml } from '@extractus/feed-extractor'
const url = 'https://news.google.com/atom'
const res = await fetch(url)
const xml = await res.text()
const feed = extractFromXml(xml)
console.log(feed)
Parameters
xml
required
XML string loaded from RSS/ATOM feed resource.
parserOptions
optional
See parserOptions above.
Test
git clone https://github.com/extractus/feed-extractor.git
cd feed-extractor
npm i
npm test
Quick evaluation
git clone https://github.com/extractus/feed-extractor.git
cd feed-extractor
npm install
npm run eval https://news.google.com/rss
License
The MIT License (MIT)