Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
crawler-mod
Advanced tools
Readme
node-crawler aims to be the best crawling/scraping package for Node.
It features:
The argument for creating this package was made at ParisJS #2 in 2010 ( lightning talk slides )
Help & Forks welcomed!
$ npm install crawler
var Crawler = require("crawler").Crawler;
var c = new Crawler({
"maxConnections":10,
// This will be called for each crawled page
"callback":function(error,result,$) {
// $ is a jQuery instance scoped to the server-side DOM of the page
$("#content a").each(function(index,a) {
c.queue(a.href);
});
}
});
// Queue just one URL, with default callback
c.queue("http://joshfire.com");
// Queue a list of URLs
c.queue(["http://jamendo.com/","http://tedxparis.com"]);
// Queue URLs with custom callbacks & parameters
c.queue([{
"uri":"http://parishackers.org/",
"jQuery":false,
// The global callback won't be called
"callback":function(error,result) {
console.log("Grabbed",result.body.length,"bytes");
}
}]);
// Queue some HTML code directly without grabbing (mostly for tests)
c.queue([{
"html":"<p>This is a <strong>test</strong></p>"
}]);
You can pass these options to the Crawler() constructor if you want them to be global or as items in the queue() calls if you want them to be specific to that item (overwriting global options)
This options list is a strict superset of mikeal's request options and will be directly passed to the request() method.
Basic request options:
Callbacks:
Pool options:
Retry options:
Server-side DOM options:
Charset encoding:
Cache:
Other:
When using timeouts, to avoid triggering Node #3076 you should use Node > 0.8.14
There is now a complete memory leak test for node-crawler :)
$ npm install && npm test
Feel free to add more tests!
0.2.5
options.encoding = null
, thanks @trantorLiu0.2.4
0.2.3
0.2.2
0.2.1
0.2.0
0.1.0
FAQs
based on node-crawler
The npm package crawler-mod receives a total of 2 weekly downloads. As such, crawler-mod popularity was classified as not popular.
We found that crawler-mod demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.