discovery-web-crawler
Crawls a website and populates a Watson Discovery Collection.
Install
npm install discovery-web-crawler
Usage
The following snippet will gather Watson stories from the IBM website and index them in Watson Discovery.
const DiscoveryWebCrawler = require('discovery-web-crawler')
let crawler = new DiscoveryWebCrawler({
serviceUrl: 'YOUR_SERVICE_URL',
apikey: 'YOUR_APIKEY',
environmentId: 'YOUR_ENVIRONMENT_ID',
collectionId: 'YOUR_COLLECTION_ID',
url: 'https://www.ibm.com/watson/stories/',
maxDepth: 3,
fetchCondition: queueItem => queueItem.path.startsWith('/watson/'),
urlCondition: url => !url.match('/list'),
parse: async $ => ({ text: $('main').text().replace(/\s+/g, ' ').trim() }),
})
crawler.start()
Run tests
npm run test
Author
👤 Marco Cardoso
Show your support
Give a ⭐️ if this project helped you!