Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

crawler-find-word

Package Overview
Dependencies
Maintainers
1
Versions
4
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

crawler-find-word

crawler service

  • 0.1.3
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
1
Maintainers
1
Weekly downloads
 
Created
Source

Simple but powerful crawler - find phrase deep in the web

Build Status

Description

Deep crawl to find word in the body of web pages by base url.
Simple but powerful, popular and production crawling/scraping package for Node.

We strict about our code , so, we choose to use 'travis-ci' and 'npm audit'.

Review bugs you found or feature you want in our Slack click this Slack invitation

Features:

Configurable level of maximum pages to visit.
Configurable root URL and Word to search.
Use event driven API, Raise 'Done' event when process ends.
Return usefull statistical data. Enable Docker hosting
Use Cheerio to find word or phrase in the DOM.
Tested with Mocha and Chai.

Future features

Add 'Error' event handling.
Priority queue of requests.
Control rate limit.
Charset detection and conversion.

Demo


 'use strict';
 var srv = require('crawler-find-word'); 
 var print = function(){
     var count = srv.pages.length;
     for(var i=0; i < count; ){
         var u = srv.pages.pop();               
         console.log(JSON.stringify(u));
         i++;
     };
 }
 
 srv.eventHandler.on('done', print);
 srv.crawl('https://cnn.com/', 'trump', 2);

Run

Running nodemon ./crawler-find-word.js localhost 3000 will run the unit tests with debug mode.

Test

Run the command npm test for Mocha & Chai testing unit.

Deploy to Docker

Run command to build Docker image from the project directory docker build -t node-crawler-dev-env .

Run command to create Docker instance docker container run -p 9999:80 --name node-crawler-app --rm -v local-drive:/usr/src/app node-crawler-dev-env:latest

Keywords

FAQs

Package last updated on 22 May 2018

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc