Socket
Socket
Sign inDemoInstall

crawler-lib

Package Overview
Dependencies
360
Maintainers
1
Versions
1
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    crawler-lib

crawler framework for nodejs


Version published
Weekly downloads
0
decreased by-100%
Maintainers
1
Created
Weekly downloads
 

Readme

Source

Build Status DeepScan grade npm version 996.icu LICENSE

Nodejs 的爬虫框架

一个快速,简单,易用的 node.js 爬虫框架.

WARNING: 当前仍处于开发阶段,任何存在的特性都可能会更改或者移除

特性

  • 访问频率限制
  • 暂停和恢复爬取
  • 连接上一次记录续爬
  • 内置的 jQuery 选择器
  • 内置的资源下载工具
  • IP 代理设置
  • 数据统计
  • 自定义 HTTP 方法、请求体
  • 错误重试

快速开始

npm install @axetroy/crawler
import { Crawler, Provider, Response } from "@axetroy/crawler";

class MyProvider implements Provider {
  name = "scrapinghub";
  urls = ["https://blog.scrapinghub.com"];
  async parse($: Response) {
    const $nextPage = $("a.next-posts-link").eq(0);

    if ($nextPage) {
      $.follow($nextPage.prop("href"));
    }

    return $(".post-header>h2")
      .map((_, el) => $(el).text())
      .get();
  }
}

const config: Options = {
  timeout: 1000 * 5,
  retry: 3
};

new Crawler(MyProvider, config)
  .on("data", (articles: string[]) => {
    for (const article of articles) {
      process.stdout.write(article + "\n");
    }
  })
  .on("error", (err, task) => {
    console.log(`request fail on ${task.url}: ${err.message}`);
  })
  .start();

API Reference

API Reference

Example

如何运行 demo?

> npx ts-node examples/basic.ts

这里有相关的例子

License

The Anti 996 License

FAQs

Last updated on 12 Jun 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc