Socket
Socket
Sign inDemoInstall

discovery-web-crawler

Package Overview
Dependencies
3
Maintainers
1
Versions
5
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    discovery-web-crawler

Crawls a website and populates a Watson Discovery Collection.


Version published
Weekly downloads
1
Maintainers
1
Created
Weekly downloads
 

Readme

Source

discovery-web-crawler

Version License: ISC Coverage Status Node.js CI

Crawls a website and populates a Watson Discovery Collection.

Install

npm install discovery-web-crawler

Usage

The following snippet will gather Watson stories from the IBM website and index them in Watson Discovery.

const DiscoveryWebCrawler = require('discovery-web-crawler')

let crawler = new DiscoveryWebCrawler({
    serviceUrl: 'YOUR_SERVICE_URL',
    apikey: 'YOUR_APIKEY',
    environmentId: 'YOUR_ENVIRONMENT_ID',
    collectionId: 'YOUR_COLLECTION_ID',

    url: 'https://www.ibm.com/watson/stories/',                                 // Starting point URL
    maxDepth: 3,                                                                // Max crawler depth
    fetchCondition: queueItem => queueItem.path.startsWith('/watson/'),         // Condition to crawl this URL
    urlCondition: url => !url.match('/list'),                                   // Condition to index this URL
    parse: async $ => ({ text: $('main').text().replace(/\s+/g, ' ').trim() }), // Cheerio API to extract JSON from HTML content
})
crawler.start()


Run tests

npm run test

Author

👤 Marco Cardoso

Show your support

Give a ⭐️ if this project helped you!

Keywords

FAQs

Last updated on 15 Apr 2021

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc