Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

discovery-web-crawler

Package Overview
Dependencies
Maintainers
1
Versions
5
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

discovery-web-crawler

Crawls a website and populates a Watson Discovery Collection.

  • 1.2.1
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
1
Maintainers
1
Weekly downloads
 
Created
Source

discovery-web-crawler

Version License: ISC Coverage Status Node.js CI

Crawls a website and populates a Watson Discovery Collection.

Install

npm install discovery-web-crawler

Usage

The following snippet will gather Watson stories from the IBM website and index them in Watson Discovery.

const DiscoveryWebCrawler = require('discovery-web-crawler')

let crawler = new DiscoveryWebCrawler({
    serviceUrl: 'YOUR_SERVICE_URL',
    apikey: 'YOUR_APIKEY',
    environmentId: 'YOUR_ENVIRONMENT_ID',
    collectionId: 'YOUR_COLLECTION_ID',

    url: 'https://www.ibm.com/watson/stories/',                                 // Starting point URL
    maxDepth: 3,                                                                // Max crawler depth
    fetchCondition: queueItem => queueItem.path.startsWith('/watson/'),         // Condition to crawl this URL
    urlCondition: url => !url.match('/list'),                                   // Condition to index this URL
    parse: async $ => ({ text: $('main').text().replace(/\s+/g, ' ').trim() }), // Cheerio API to extract JSON from HTML content
})
crawler.start()


Run tests

npm run test

Author

👤 Marco Cardoso

Show your support

Give a ⭐️ if this project helped you!

Keywords

FAQs

Package last updated on 15 Apr 2021

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc