🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
Sign inDemoInstall
Socket

crawlho

Package Overview
Dependencies
Maintainers
1
Versions
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

crawlho

Simple web crawler

0.0.3
latest
Source
npm
Version published
Weekly downloads
1
-83.33%
Maintainers
1
Weekly downloads
 
Created
Source

crawlho

Simple web crawler

Installation

As simple as npm install crawlho.

Usage

crawlho(options, callback);

Example

var crawlho = require('crawlho');

crawlho({
    url: 'http://example.com', //mandatory
    extract: function($) { //mandatory
        var results = [];
        
        $('.someClass').each(function() {
            results.push($(this.text()));
        });
        
        //You should return the data you wanna grab in form of an array!
        return results; 
    }
}, function(err) {
    if(err) {
        throw err; //Something went wrong!
    }
});

Defaults for the options hash are as follows

 var options = {
    sameDomain: true, 
	//Follow only internal links - default: true

	debug: false,
	//Prints current requested url and level depth - default: false
	
	maxlevel: 2,
	//Maximum level depth - default: 2
	
	delay: 1000,
	//Time delay between requests - default: 1000ms
	
	onResult: function(results) { 
	    //what to do whenever whenever your extract function finds something
	    //this is the default implementation (writes to stdout)
	    //results is the array sent by .extract when it has .length > 0
	    
		results.forEach(function(result) {
			process.stdout.write(result + os.EOL);
		});
	},
	
	shouldResetLevel: function(url) {
	    //optional function that resets depth level to 1
	    //it is useful when dealing with pagination so
	    //following `url?page=2` doesn't count as a new level
		return false;
	},
	
	shouldFollow: function(url) {
	    //Every url is passed to this function so you can decide
	    //wether you carwlho should follow this link or not
	    //useful to prevent crawling files (.zip, .rar, .mp3)
		return true;
	}
}

Keywords

web

FAQs

Package last updated on 18 Aug 2016

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts