crawler-find-word
Advanced tools
Comparing version 0.1.0 to 0.1.1
{ | ||
"name": "crawler-find-word", | ||
"version": "0.1.0", | ||
"version": "0.1.1", | ||
"description": "crawler service", | ||
@@ -28,3 +28,4 @@ "main": "crawler-find-word.js", | ||
"keywords": [ | ||
"crawler" | ||
"crawler","crawling","scraper","spider","search","find", | ||
"word","phrase","javascript","node","nodejs","url" | ||
], | ||
@@ -31,0 +32,0 @@ "bugs": { |
@@ -1,2 +0,2 @@ | ||
## crawler-find-phrase | ||
## Simple but powerful crawler - find phrase deep in the web | ||
@@ -8,19 +8,40 @@ [![Build Status](https://travis-ci.org/idangvili/crawler-find-word.svg?branch=master)](https://travis-ci.org/idangvili/crawler-find-word) | ||
Deep crawl to find word in the body of web pages by base url | ||
Simple but powerful, popular and production crawling/scraping package for Node. | ||
## Features: | ||
Configurable level of maximum pages to visit | ||
Configurable root URL and Word to search | ||
Use event driven API, Raise 'Done' event when process ends. | ||
Return usefull statistical data. | ||
Use Cheerio to find word or phrase in the DOM. | ||
Tested with Mocha and Chai | ||
## Future features | ||
Add 'Error' event handling. | ||
Priority queue of requests. | ||
Control rate limit. | ||
Charset detection and conversion. | ||
## Demo | ||
### 'use strict'; | ||
### var srv = require('crawler-find-word'); | ||
### var print = function(){ | ||
### var count = srv.pages.length; | ||
### for(var i=0; i < count; ){ | ||
### var u = srv.pages.pop(); | ||
### console.log(JSON.stringify(u)); | ||
### i++; | ||
### }; | ||
### } | ||
### | ||
### srv.eventHandler.on('done', print); | ||
### srv.crawl('https://cnn.com/', 'trump', 2); | ||
```node | ||
'use strict'; | ||
var srv = require('crawler-find-word'); | ||
var print = function(){ | ||
var count = srv.pages.length; | ||
for(var i=0; i < count; ){ | ||
var u = srv.pages.pop(); | ||
console.log(JSON.stringify(u)); | ||
i++; | ||
}; | ||
} | ||
srv.eventHandler.on('done', print); | ||
srv.crawl('https://cnn.com/', 'trump', 2); | ||
``` | ||
## Run | ||
@@ -27,0 +48,0 @@ |
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
7853
52