
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
Powerful & Easy to use web crawler for Nodejs. Arachnod has been designed for heavy and long running tasks, for performance & efective resource usage. For it's goals Arachnod uses Redis's power as a backend. Covering all heavy & time consuming tasks such as controlling urls & their tasks to store & distribute information among the Arachnod's child tasks (Spiderlings). Arachnod also avoids to use any server-side DOM requiring technics such as jQuery with JSdom to use resources properly. Frankly, I have tested JSdom for along time with no luck, always memory leaks & high memory usage. Libxml based XPath solutions were not actually real, Instead, Arachnod uses Cheerio for accessing DOM elements. Also uses SuperAgent as HTTP Client.
$ npm install arachnod
Or via Git
$ git clone git@github.com:risyasin/arachnod.git
Then, install required Nodejs modules with npm
$ npm install
Please make sure you have a running redis-server
var bot = require('arachnod');
bot.on('hit', function (doc, $) {
// Do whatever you want to do parsed html content.
var desc = $('article.entry-content').text();
console.log(doc.path, desc);
// if you don't need to follow all links.
bot.pause();
});
bot.crawl({
'redis': '127.0.0.1',
'parallel': 4,
'start': 'https://github.com/risyasin/arachnod',
'resume': false
});
bot.on('error', function (err, task) {
console.log('Bot error:', err, err.stack, task);
});
bot.on('end', function (err, status) {
console.log('Bot finished:', err, status);
});
| Parameter Name | Description |
|---|---|
| start | Start url for crawling (Mandatory) |
| parallel | Number of child processes that will handle network tasks (Default: 8) Do not this more than 20. |
| redis | Host name or IP address that Redis runs on (Default: 127.0.0.1) |
| redisPort | Port number for Redis (Default: 6379) |
| verbose | Arachnod will tell you more, 1 (silence) - 10 (everything). Default: 1. |
| resume | Resume support, Simply, does not resets queues if there is any. (Default: false) |
| ignorePaths | Ignores paths starts with. Must be multiple in array syntax such as ['/blog','/gallery'] |
| ignoreParams | Ignores query string parameters, Must be in array syntax. such as ['color','type'] |
| sameDomain | Stays in the same hostname. (will be implemented at v1) |
| useCookies | Using cookies (will be implemented at v0.4) |
| obeyRobotsTxt | As it's name says. Honors the robots.txt (will be implemented at v0.5) |
| Event Name | Description |
|---|---|
| hit | Emits when a url has been downloaded & processed, sends two parameters in order doc Parsed url info, $ as Cheerio object. |
| error | Emits when an error occurs at any level including child processes. Single parameter Error or Exception. |
| end | Emits when reached at the end of tasks queue. Return statistics. |
| stats | Emits bot stats whenever a child changes it's states (such as downloading or querying queues). Use wisely. |
| Method Name | Description |
|---|---|
| crawl(Parameters) | Starts a new crawling session with parameters |
| pause() | Stops bot but does not delete any task queue. |
| resume() | Starts back a paused session. Useful to control resource usage in low spec systems (single core etc.). |
| queue(url) | Adds given url to task queue. |
| getStats() | Returns various statistics such as downloaded, checked, finished url counts, memory size etc. |
If you love to use Arachnod. Help me to improve it. Feel free to make pull request for anything useful.
Copyright 2015 yasin inat
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
FAQs
Web Crawler for Node.js (Redis)
We found that arachnod demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.