Socket
Socket
Sign inDemoInstall

@extractus/feed-extractor

Package Overview
Dependencies
96
Maintainers
1
Versions
31
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    @extractus/feed-extractor

To read and normalize RSS/ATOM/JSON feed data


Version published
Weekly downloads
2.8K
increased by14.33%
Maintainers
1
Created
Weekly downloads
 

Readme

Source

feed-reader

To read & normalize RSS/ATOM/JSON feed data.

NPM CI test Coverage Status CodeQL JavaScript Style Guide

Intro

feed-reader is a part of tool sets for content builder:

  • feed-reader: extract & normalize RSS/ATOM/JSON feed
  • article-parser: extract main article from given URL
  • oembed-parser: extract oEmbed data from supported providers

You can use one or combination of these tools to build news sites, create automated content systems for marketing campaign or gather dataset for NLP projects...

                                    ┌────────────────┐
                            ┌───────► article-parser ├──────────┐
                            │       └────────────────┘          │
┌─────────────┐   ┌─────────┴────┐                     ┌────────▼─────────┐   ┌─────────────┐
│ feed-reader ├───► feed entries │                     │ content database ├───► public APIs │
└─────────────┘   └─────────┬────┘                     └────────▲─────────┘   └─────────────┘
                            │       ┌────────────────┐          │
                            └───────► oembed-parser  ├──────────┘
                                    └────────────────┘

Demo

Install & Usage

Node.js

npm i feed-reader

# pnpm
pnpm i feed-reader

# yarn
yarn add feed-reader
// es6 module
import { read } from 'feed-reader'

// CommonJS
const { read } = require('feed-reader')

// or specify exactly path to CommonJS variant
const { read } = require('feed-reader/dist/cjs/feed-reader.js')

Deno

import { read } from 'https://esm.sh/feed-reader'

Browser

import { read } from 'https://unpkg.com/feed-reader@latest/dist/feed-reader.esm.js'

Please check the examples for reference.

Deta cloud

For Deta devs please refer the source code and guideline here or simply click the button below.

Deploy

APIs

read()

Load and extract feed data from given RSS/ATOM/JSON source. Return a Promise object.

Syntax
read(String url)
read(String url, Object options)
read(String url, Object options, Object fetchOptions)
Parameters
url required

URL of a valid feed source

Feed content must be accessible and conform one of the following standards:

For example:

import { read } from 'feed-reader'

read('https://news.google.com/atom').then(result => console.log(result))

Without any options, the result should have the following structure:

{
  title: String,
  link: String,
  description: String,
  generator: String,
  language: String,
  published: ISO Date String,
  entries: Array[
    {
      title: String,
      link: String,
      description: String,
      published: ISO Datetime String
    },
    // ...
  ]
}
options optional

Object with all or several of the following properties:

  • normalization: Boolean, normalize feed data or keep original. Default true.
  • useISODateFormat: Boolean, convert datetime to ISO format. Default true.
  • descriptionMaxLen: Number, to truncate description. Default 210 (characters).
  • xmlParserOptions: Object, used by xml parser, view fast-xml-parser's docs
  • getExtraFeedFields: Function, to get more fields from feed data
  • getExtraEntryFields: Function, to get more fields from feed entry data

For example:

import { read } from 'feed-reader'

read('https://news.google.com/atom', {
  useISODateFormat: false
})

read('https://news.google.com/rss', {
  useISODateFormat: false,
  getExtraFeedFields: (feedData) => {
    return {
      subtitle: feedData.subtitle || ''
    }
  },
  getExtraEntryFields: (feedEntry) => {
    const {
      enclosure,
      category
    } = feedEntry
    return {
      enclosure: {
        url: enclosure['@_url'],
        type: enclosure['@_type'],
        length: enclosure['@_length']
      },
      category: isString(category) ? category : {
        text: category['@_text'],
        domain: category['@_domain']
      }
    }
  }
})
fetchOptions optional

You can use this param to set request headers to fetch.

For example:

import { read } from 'feed-reader'

const url = 'https://news.google.com/rss'
read(url, null, {
  headers: {
    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
  }
})

You can also specify a proxy endpoint to load remote content, instead of fetching directly.

For example:

import { read } from 'feed-reader'

const url = 'https://news.google.com/rss'

read(url, null, {
  headers: {
    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
  },
  proxy: {
    target: 'https://your-secret-proxy.io/loadXml?url=',
    headers: {
      'Proxy-Authorization': 'Bearer YWxhZGRpbjpvcGVuc2VzYW1l...'
    }
  }
})

Passing requests to proxy is useful while running feed-reader on browser. View examples/browser-feed-reader as reference example.

Quick evaluation

git clone https://github.com/ndaidong/feed-reader.git
cd feed-reader
npm install

node eval.js --url=https://news.google.com/rss --normalization=y --useISODateFormat=y --includeEntryContent=n --includeOptionalElements=n

License

The MIT License (MIT)


Keywords

FAQs

Last updated on 29 Nov 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc