Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

krawler

Package Overview
Dependencies
Maintainers
1
Versions
9
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

krawler - npm Package Compare versions

Comparing version 0.1.0 to 0.2.0

lib/index.js

11

package.json
{
"name": "krawler",
"version": "0.1.0",
"version": "0.2.0",
"description": "Fast and lightweight web crawler with built-in cheerio, xml and json parser.",

@@ -8,2 +8,3 @@ "keywords": [

"javascript",
"crawler",
"crawling",

@@ -16,3 +17,5 @@ "spider",

"xml",
"json"
"json",
"promise",
"event"
],

@@ -51,3 +54,3 @@ "maintainers": [

"scripts": {
"test": "mocha test/test.js"
"test": "mocha test/index"
},

@@ -60,3 +63,3 @@ "engines": [

},
"main": "./lib/krawler"
"main": "./lib/index"
}
# node Krawler [![Build Status](https://travis-ci.org/ondrs/node-krawler.png?branch=master)](https://travis-ci.org/ondrs/node-krawler)
Fast and lightweight web crawler with built-in cheerio, xml and json parser.
Fast and lightweight promise/event based web krawler with built-in cheerio, xml and json parser.
And of course ... the best :)

@@ -14,15 +14,19 @@

```javascript
var crawler = new Krawler;
var Krawler = require('krawler')
crawler
.queue('http://ondraplsek.cz')
var urls = [
'http://ondraplsek.cz'
];
var krawler = new Krawler;
krawler
.queue(urls)
.on('data', function($, url, response) {
// $ - cheerio instance
// url of the current webpage
// response object from mikeal/request
})
.on('err', function(err, url) {
// there has ben an 'err' on 'url'
.on('error', function(err, url) {
// there has been an 'error' on 'url'
})

@@ -34,9 +38,71 @@ .on('end', function() {

Krawler provides three types of built in parses
- cheerio (default)
- xml
- json
## Options
Krawler provides following API:
```javascript
var krawler = new Krawler({
maxConnections: 10, // number of max simultaneously opened connections, default 10
parser: 'cheerio', // web page parser, default 'cheerio'
// another options are xml, json or false (no parser will be used, raw data will be returned)
forceUTF8: false, // if Krawler should convert source string to utf8, default false
});
```
mikeal/request is used for fetching web pages so any desired option from this package can be passed to Krawler's constructor.
## Advanced Example
```javascript
var urls = [
'https://graph.facebook.com/nodejs',
'https://graph.facebook.com/facebook',
'https://graph.facebook.com/cocacola',
'https://graph.facebook.com/google',
'https://graph.facebook.com/microsoft',
];
var krawler = new Krawler({
maxConnections: 5,
parser: 'json',
forceUTF8: true
});
krawler
.on('data', function(json, url, response) {
// do something with json...
})
.on('error', function(err, url) {
// there has been an 'error' on 'url'
})
.on('end', function() {
// all URLs has been fetched
});
```
## Promises
If your program flow is based on promises you can easily attach Krawler to your promise chain.
Method fetchUrl() returns a Q.promise. When the promise is full filled, callback function is called with a result object.
Object has two properties
* data - parsed/raw content of the web page base on parser setting
* response - response object from mikeal/request
```javascript
var krawler = new Krawler;
findUrl()
.then(function(url) {
return krawler.fetchUrl(url);
})
.then(function(result) {
// in this case result.data in a cheerio instance
return processData(result.data);
})
// and so on ...

Sorry, the diff of this file is not supported yet

Sorry, the diff of this file is not supported yet

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc