Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

x-crawl

Package Overview
Dependencies
Maintainers
1
Versions
66
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

x-crawl - npm Package Versions

1
7

5.0.1

Diff

Changelog

Source

v5.0.1 (2023-04-08)

🚀 Features

  • New adjustments to the document.

🚀 特征

  • 文档新的调整。
coderhxl
published 5.0.0 •

Changelog

Source

v5.0.0 (2023-04-06)

🚨 Breaking Changes

  • For configuration, major changes have been made to each crawling API configuration, and the same API supports more crawling configuration methods, each of which has its own significance.
  • For the result, the result of each request will be wrapped in an object, which provides information about the result of this request, such as: id, result, success, maximum retry, number of retries, collected error information, etc. . Automatically determine whether the return value is wrapped in an array according to the configuration method you choose, and the type is perfectly matched in TS.
  • For obtaining results through the callback function, the callback is no longer executed after a single request is completed like the v4.x version, but will be executed sequentially after the crawling is completed, which will not block subsequent crawling.

🚀 Features

  • Added a retry mechanism, which can be set for all crawling requests, for a single crawling request, and for a single request.
  • A new priority queue is added to use priority crawling according to the priority of a single request.
  • For more configurations that may be reused, you can set the baseConfig settings passed in when requesting configuration, API crawling configuration, and generating crawler instances, such as: timeout, proxy, intervalTime, etc., and the weight is: requestConfig > APIConfig > baseConfig.
  • For crawlFile API, file path, name, suffix and other information can be set individually for each file. Added the beforeSave life cycle function before saving the file. You can get the file data of the Buffer type, and you can perform operations such as compression on the data in the callback. The returned new Buffer data will replace the original data and write it into the file.
  • Update the output of crawling on the console, and collect the error information generated by crawling into an error queue. After the crawling is completed, you can get the error message queue through the return value.

🚨 重大改变

  • 对于配置,每个爬取 API 配置发生重大改变,同一个 API 支持更多爬取配置方式,每种方式都有其存在的意义。
  • 对于结果,每个请求的结果将统一使用对象包裹着,该对象提供了关于这次请求结果的信息,比如:id、结果、是否成功、最大重试、重试次数、收集到错误信息等。自动根据你选用的配置方式决定返回值是否包裹在一个数组中,在 TS 中类型完美适配。
  • 对于通过回调函数方式获取结果,该回调不再像 v4.x 版本在单个请求完成后执行,而是将会在爬取完成后按顺序执行,这将不会阻塞后续的爬取。

🚀 特征

  • 新增失败重试,可针对所有爬取的请求设置,针对单次爬取的请求设置,针对单个请求设置进行失败重试。
  • 新增优先队列,根据单个请求的优先级使用优先爬取。
  • 对更多可能复用的配置可以在请求配置、API 爬取配置、生成爬虫实例时传入的 baseConfig 设置,比如:timeout、proxy、intervalTime 等,权重为:requestConfig > APIConfig > baseConfig。
  • 对 crawlFile API 可单独为每个文件设置文件路径、名字、后缀等信息。新增 beforeSave 文件保存前生命周期函数,可拿到 Buffer 类型的文件数据,可在回调内对数据进行压缩等操作,返回新的 Buffer 数据会替代原先的数据写入文件中。
  • 对爬取在控制台的输出更新,对爬取产生的报错信息分别收集到一个的错误队列中,爬取完成后可通过返回值拿到该错误信息队列。
coderhxl
published 4.0.1 •

Changelog

Source

v4.0.1 (2023-03-30)

🐞 Bug Fixes

  • The page is not closed when there is an error in the crawlPage API
coderhxl
published 4.0.0 •

Changelog

Source

v4.0.0 (2023-03-27)

🚨 Breaking Changes

  • The crawlPage API can add batch requests.
  • The crawlPage API remove JSDOM.

🚀 Features

  • Document updates.
coderhxl
published 3.3.0 •

Changelog

Source

v3.3.0 (2023-03-24)

🚀 Features

  • The crawlerPage API crawling page allows to carry Cookies (for login and other operations).
coderhxl
published 3.2.12 •

Changelog

Source

v3.2.12 (2023-03-23)

🐞 Bug Fixes

  • Document jump fix
coderhxl
published 3.2.11 •

Changelog

Source

v3.2.11 (2023-03-22)

🚀 Features

  • Test updates, unit test all APIs

🐞 Bug Fixes

  • Fix crawlPage API internal error
coderhxl
published 3.2.10 •

Changelog

Source

v3.2.10 (2023-03-21)

🚀 Features

  • Update documents
coderhxl
published 3.2.9 •

Changelog

Source

v3.2.9 (2023-03-20)

🚀 Features

  • Update dependency
coderhxl
published 3.2.8 •

Changelog

Source

v3.2.8 (2023-02-19)

🐞 Bug Fixes

  • Internal type adjustment.
  • Catch crawlPage API errors.
SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc