{
		"name": "krawler",
		"version": "0.1.0",
		"version": "0.2.0",
		"description": "Fast and lightweight web crawler with built-in cheerio, xml and json parser.",
		@@ -8,2 +8,3 @@ "keywords": [
		"javascript",
		"crawler",
		"crawling",
		@@ -16,3 +17,5 @@ "spider",
		"xml",
		"json"
		"json",
		"promise",
		"event"
		],
		@@ -51,3 +54,3 @@ "maintainers": [
		"scripts": {
		"test": "mocha test/test.js"
		"test": "mocha test/index"
		},
		@@ -60,3 +63,3 @@ "engines": [
		},
		"main": "./lib/krawler"
		"main": "./lib/index"
		}

README.md

		# node Krawler [![Build Status](https://travis-ci.org/ondrs/node-krawler.png?branch=master)](https://travis-ci.org/ondrs/node-krawler)

		Fast and lightweight web crawler with built-in cheerio, xml and json parser.
		Fast and lightweight promise/event based web krawler with built-in cheerio, xml and json parser.
		And of course ... the best :)
		@@ -14,15 +14,19 @@
		```javascript
		var crawler = new Krawler;
		var Krawler = require('krawler')

		crawler
		.queue('http://ondraplsek.cz')
		var urls = [
		'http://ondraplsek.cz'
		];

		var krawler = new Krawler;

		krawler
		.queue(urls)
		.on('data', function($, url, response) {

		// $ - cheerio instance
		// url of the current webpage
		// response object from mikeal/request

		})
		.on('err', function(err, url) {
		// there has ben an 'err' on 'url'
		.on('error', function(err, url) {
		// there has been an 'error' on 'url'
		})
		@@ -34,9 +38,71 @@ .on('end', function() {

		Krawler provides three types of built in parses
		- cheerio (default)
		- xml
		- json

		## Options

		Krawler provides following API:

		```javascript
		var krawler = new Krawler({
		maxConnections: 10, // number of max simultaneously opened connections, default 10
		parser: 'cheerio', // web page parser, default 'cheerio'
		// another options are xml, json or false (no parser will be used, raw data will be returned)
		forceUTF8: false, // if Krawler should convert source string to utf8, default false
		});
		```

		mikeal/request is used for fetching web pages so any desired option from this package can be passed to Krawler's constructor.

		## Advanced Example

		```javascript
		var urls = [
		'https://graph.facebook.com/nodejs',
		'https://graph.facebook.com/facebook',
		'https://graph.facebook.com/cocacola',
		'https://graph.facebook.com/google',
		'https://graph.facebook.com/microsoft',
		];

		var krawler = new Krawler({
		maxConnections: 5,
		parser: 'json',
		forceUTF8: true
		});

		krawler
		.on('data', function(json, url, response) {
		// do something with json...
		})
		.on('error', function(err, url) {
		// there has been an 'error' on 'url'
		})
		.on('end', function() {
		// all URLs has been fetched
		});
		```


		## Promises

		If your program flow is based on promises you can easily attach Krawler to your promise chain.
		Method fetchUrl() returns a Q.promise. When the promise is full filled, callback function is called with a result object.

		Object has two properties

		* data - parsed/raw content of the web page base on parser setting
		* response - response object from mikeal/request


		```javascript
		var krawler = new Krawler;

		findUrl()
		.then(function(url) {
		return krawler.fetchUrl(url);
		})
		.then(function(result) {
		// in this case result.data in a cheerio instance
		return processData(result.data);
		})
		// and so on ...

lib/krawler.js

test/test.js

.idea/workspace.xml

Sorry, the diff of this file is not supported yet

.npmignore

Sorry, the diff of this file is not supported yet

krawler - npm Package Compare versions

New alerts

Fixed alerts

Improved metrics

Worsened metrics