Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

x-crawl

Package Overview
Dependencies
Maintainers
1
Versions
66
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

x-crawl - npm Package Compare versions

Comparing version 6.0.0 to 6.0.1

6

package.json
{
"name": "x-crawl",
"version": "6.0.0",
"version": "6.0.1",
"author": "coderHXL",

@@ -14,4 +14,4 @@ "description": "x-crawl is a flexible Node.js multifunctional crawler library.",

"spider",
"fingerprint",
"flexible",
"fingerprint",
"multifunction"

@@ -36,5 +36,5 @@ ],

"https-proxy-agent": "^5.0.1",
"puppeteer": "19.8.0"
"puppeteer": "19.10.0"
},
"devDependencies": {}
}

@@ -1,6 +0,6 @@

# x-crawl [![npm](https://img.shields.io/npm/v/x-crawl.svg)](https://www.npmjs.com/package/x-crawl) [![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/coder-hxl/x-crawl/blob/main/LICENSE)
# x-crawl · [![npm](https://img.shields.io/npm/v/x-crawl.svg)](https://www.npmjs.com/package/x-crawl) [![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/coder-hxl/x-crawl/blob/main/LICENSE)
English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.md)
x-crawl is a flexible Node.js multifunctional crawler library. Used to crawl pages, crawl interfaces, crawl files, and poll crawls.
x-crawl is a flexible Node.js multipurpose crawler library. The usage is flexible, and there are many built-in functions for crawl pages, crawl interfaces, crawl files, etc.

@@ -11,15 +11,15 @@ > If you also like x-crawl, you can give [x-crawl repository](https://github.com/coder-hxl/x-crawl) a star to support it, thank you for your support!

- **🔥 Async/Sync** - Just change the mode property to toggle async/sync crawling mode.
- **⚙️Multiple functions** - Can crawl pages, crawl interfaces, crawl files and poll crawls. And it supports crawling single or multiple.
- **🖋️ Flexible writing method** - A function adapts to multiple crawling configurations and obtains crawling results. The writing method is very flexible.
- **👀 Device Fingerprinting** - Zero configuration/custom configuration to avoid fingerprinting to identify and track us from different locations.
- **⏱️ Interval crawling** - no interval/fixed interval/random interval, can effectively use/avoid high concurrent crawling.
- **🔄 Retry on failure** - It can be set for all crawling requests, for a single crawling request, and for a single request to set a failed retry.
- **🚀 Priority Queue** - Use priority crawling based on the priority of individual requests.
- **☁️ Crawl SPA** - Batch crawl SPA (Single Page Application) to generate pre-rendered content (ie "SSR" (Server Side Rendering)).
- **⚒️ Controlling Pages** - Headless browsers can submit forms, keystrokes, event actions, generate screenshots of pages, etc.
- **🧾 Capture Record** - Capture and record the crawled results, and highlight them on the console.
- **🔥 Asynchronous Synchronous** - Just change the mode property to toggle asynchronous or synchronous crawling mode.
- **⚙️Multiple purposes** - It can crawl pages, crawl interfaces, crawl files and poll crawls to meet the needs of various scenarios.
- **🖋️ Flexible writing style** - The same crawling API can be adapted to multiple configurations, and each configuration method is very unique.
- **👀Device Fingerprinting** - Zero configuration or custom configuration, avoid fingerprinting to identify and track us from different locations.
- **⏱️ Interval Crawling** - No interval, fixed interval and random interval to generate or avoid high concurrent crawling.
- **🔄 Failed Retry** - Avoid crawling failure due to transient problems, unlimited retries.
- **🚀 Priority Queue** - According to the priority of a single crawling target, it can be crawled ahead of other targets.
- **☁️ Crawl SPA** - Crawl SPA (Single Page Application) to generate pre-rendered content (aka "SSR" (Server Side Rendering)).
- **⚒️ Control Page** - You can submit form, keyboard input, event operation, generate screenshots of the page, etc.
- **🧾 Capture Record** - Capture and record the crawled information, and highlight it on the console.
- **🦾 TypeScript** - Own types, implement complete types through generics.
## Relationship with puppeteer
## Relationship with Puppeteer

@@ -32,27 +32,27 @@ The crawlPage API has [puppeteer](https://github.com/puppeteer/puppeteer) built in, you only need to pass in some configuration options to complete some operations, and the result will expose Brower instances and Page instances.

- [Example](#Example)
- [Core concepts](#Core-concepts)
- [Create application](#Create-application)
- [An example of a crawler application](#An-example-of-a-crawler-application)
- [Crawl mode](#Crawl-mode)
- [Device fingerprint](#Device-fingerprint)
- [Multiple crawler application instances](#Multiple-crawler-application-instances)
- [Crawl page](#Crawl-page)
- [browser instance](#browser-instance)
- [page instance](#page-instance)
- [life cycle](#life-cycle)
- [Core Concepts](#Core-Concepts)
- [Create Application](#Create-Application)
- [An Example of a Crawler Application](#An-Example-of-a-Crawler-Application)
- [Crawl Mode](#Crawl-Mode)
- [Default Device Fingerprint](#Default-Device-Fingerprint)
- [Multiple Crawler Application Anstances](#Multiple-Crawler-Application-Instances)
- [Crawl Page](#Crawl-Page)
- [Browser Instance](#Browser-Instance)
- [Page Instance](#Page-Instance)
- [life Cycle](#life-Cycle)
- [onCrawlItemComplete](#onCrawlItemComplete)
- [Crawl interface](#Crawl-interface)
- [life cycle](#life-cycle-1)
- [Crawl Interface](#Crawl-Interface)
- [life Cycle](#life-Cycle-1)
- [onCrawlItemComplete](#onCrawlItemComplete-1)
- [Crawl files](#Crawl-files)
- [life cycle](#life-cycle)
- [Crawl Files](#Crawl-Files)
- [life Cycle](#life-Cycle)
- [onCrawlItemComplete](#onCrawlItemComplete-2)
- [onBeforeSaveItemFile](#onBeforeSaveItemFile)
- [Start polling](#Start-polling)
- [Config priority](#Config-Priority)
- [Device fingerprint](#Device-fingerprint-1)
- [Interval time](#Interval-time)
- [Fail retry](#Fail-retry)
- [Priority queue](#Priority-queue)
- [About results](#About-results)
- [Start Polling](#Start-Polling)
- [Config Priority](#Config-Priority)
- [Custom Device Fingerprint](#Custom-Device-Fingerprint)
- [Interval Time](#Interval-Time)
- [Fail Retry](#Fail-Retry)
- [Priority Queue](#Priority-Queue)
- [About Results](#About-Results)
- [TypeScript](#TypeScript)

@@ -67,2 +67,6 @@ - [API](#API)

- [Config](#Config)
- [Simple target config - string](#Simple-target-config---string)
- [Detailed target config - CrawlPageDetailTargetConfig](#Detailed-target-config---CrawlPageDetailTargetConfig)
- [Mixed target array config - (string | CrawlPageDetailTargetConfig)[]](#Mixed-target-array-config---string--CrawlPageDetailTargetConfig)
- [Advanced config - CrawlPageAdvancedConfig](#Advanced-config---CrawlPageAdvancedConfig)
- [crawlData](#crawlData)

@@ -72,2 +76,6 @@ - [Type](#Type-2)

- [Config](#Config-1)
- [Simple target config - string](#Simple-target-config---string-1)
- [Detailed target config - CrawlDataDetailTargetConfig](#Detailed-target-config---CrawlDataDetailTargetConfig)
- [Mixed target array config - (string | CrawlDataDetailTargetConfig)[]](#Mixed-target-array-config---string--CrawlDataDetailTargetConfig)
- [Advanced config - CrawlDataAdvancedConfig](#Advanced-config---CrawlDataAdvancedConfig)
- [crawlFile](#crawlFile)

@@ -77,2 +85,5 @@ - [Type](#Type-3)

- [Config](#Config-2)
- [Detailed target config - CrawlFileDetailTargetConfig](#Detailed-target-config---CrawlFileDetailTargetConfig)
- [Detailed target array config - CrawlFileDetailTargetConfig[]](#Detailed-target-array-config---CrawlFileDetailTargetConfig)
- [Advanced config - CrawlFileAdvancedConfig](#Advanced-config-CrawlFileAdvancedConfig)
- [crawlPolling](#crawlPolling)

@@ -177,7 +188,7 @@ - [Type](#Type-4)

## Core concepts
## Core Concepts
### Create application
### Create Application
#### An example of a crawler application
#### An Example of a Crawler Application

@@ -196,3 +207,3 @@ Create a new **application instance** via [xCrawl()](#xCrawl):

#### Crawl mode
#### Crawl Mode

@@ -211,9 +222,11 @@ A crawler application instance has two crawling modes: asynchronous/synchronous, and each crawler instance can only choose one of them.

- async: asynchronous request, in batch requests, the next request is made without waiting for the current request to complete
- sync: synchronous request, in batch requests, you need to wait for this request to complete before making the next request
- async: Asynchronous crawling target, no need to wait for the current crawling target to complete, then proceed to the next crawling target
- sync: Synchronize the crawling target. You need to wait for the completion of this crawling target before proceeding to the next crawling target
If there is an interval time set, it is necessary to wait for the interval time to end before sending the request.
If there is an interval time set, it is necessary to wait for the end of the interval time before crawling the next target.
#### Device fingerprint
**Note:** The crawling process of the crawling API is performed separately, and this mode is only valid for batch crawling targets.
#### Default Device Fingerprint
A property can be used to control whether to use the default random fingerprint, or you can configure a custom fingerprint through subsequent crawling.

@@ -233,6 +246,6 @@

- true: Enable random device fingerprinting. The fingerprint configuration of the target can be specified through the advanced version configuration or the detailed target version configuration.
- false: Turn off random device fingerprinting, without affecting the fingerprint configuration specified for the target by the advanced configuration or the detailed configuration.
- true: Enable random device fingerprinting. The fingerprint configuration of the target can be specified through advanced configuration or detailed target configuration.
- false: Turns off random device fingerprinting, does not affect the fingerprint configuration specified for the target by advanced configuration or detailed target configuration.
#### Multiple crawler application instances
#### Multiple Crawler Application Instances

@@ -251,3 +264,3 @@ ```js

### Crawl page
### Crawl Page

@@ -269,14 +282,12 @@ Crawl a page via [crawlPage()](#crawlPage) .

#### browser instance
#### Browser Instance
It is an instance object of [Browser](https://pptr.dev/api/puppeteer.browser). For specific usage, please refer to [Browser](https://pptr.dev/api/puppeteer.browser).
When you call crawlPage API to crawl pages in the same crawler instance, the browser instance used is the same, because the crawlPage API of the browser instance in the same crawler instance is shared. It's a headless browser, no UI shell, what he does is bring **all modern web platform features** provided by the browser rendering engine to the code. For specific usage, please refer to [Browser](https://pptr.dev/api/puppeteer.browser).
The browser instance is a headless browser without a UI shell. What he does is to bring **all modern network platform functions** provided by the browser rendering engine to the code.
**Note:** The browser will keep running and the file will not be terminated. If you want to stop, you can execute browser.close() to close it. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. Because the crawlPage API of the browser instance in the same crawler instance is shared.
**Note:** The browser will stay up and running, causing the file not to be terminated. If you want to stop, you can execute browser.close() to close it. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. Because when you modify the properties of the browser instance, it will affect the browser instance inside the crawlPage API of the crawler instance, the page instance that returns the result, and the browser instance, because the browser instance is shared within the crawlPage API of the same crawler instance.
#### Page Instance
#### page instance
When you call crawlPage API to crawl pages in the same crawler instance, a new page instance will be generated from the browser instance. It can be used for interactive operations. For specific usage, please refer to [Page](https://pptr.dev/api/puppeteer.page).
It is an instance object of [Page](https://pptr.dev/api/puppeteer.page). The instance can also perform interactive operations such as events. For specific usage, please refer to [page](https://pptr.dev /api/puppeteer. page).
The browser instance will retain a reference to the page instance. If it is no longer used in the future, the page instance needs to be closed by itself, otherwise it will cause a memory leak.

@@ -303,15 +314,15 @@

#### life cycle
#### life Cycle
Lifecycle functions owned by crawlPageAPI:
Lifecycle functions owned by the crawlPage API:
- onCrawlItemComplete: executed when each crawl item is finished and processed
- onCrawlItemComplete: Called when each crawl item is completed and processed
##### onCrawlItemComplete
In the onCrawlItemComplete function you can get the result of each crawl object.
In the onCrawlItemComplete function, you can get the results of each crawled goal in advance.
**Note:** If you need to crawl many pages at one time, you need to use this life cycle function to process the results of each target and close the page instance after each page is crawled down. If you do not close the page instance, then The program will crash due to too many opened pages.
### Crawl interface
### Crawl Interface

@@ -340,16 +351,14 @@ Crawl interface data through [crawlData()](#crawlData) .

#### life cycle
#### life Cycle
Lifecycle functions owned by crawlPageAPI:
Life cycle functions owned by crawlData API:
- onCrawlItemComplete: executed when each crawl item is finished and processed
- onCrawlItemComplete: Called when each crawl item is completed and processed
##### onCrawlItemComplete
In the onCrawlItemComplete function you can get the result of each crawl object.
In the onCrawlItemComplete function, you can get the results of each crawled goal in advance.
**Note:** If you need to crawl many pages at one time, you need to use this life cycle function to process the results of each target and close the page instance after each page is crawled down. If you do not close the page instance, then The program will crash due to too many opened pages.
### Crawl Files
### Crawl files
Crawl file data via [crawlFile()](#crawlFile) .

@@ -377,19 +386,19 @@

#### life cycle
#### life Cycle
Life cycle functions owned by crawlFile API:
- onCrawlItemComplete: executed when each crawl item is finished and processed
- onCrawlItemComplete: Called when each crawl item is completed and processed
- onBeforeSaveItemFile: executed before saving the file
- onBeforeSaveItemFile: Callback before saving the file
##### onCrawlItemComplete
In the onCrawlItemComplete function you can get the result of each crawl object.
In the onCrawlItemComplete function, you can get the results of each crawled goal in advance.
##### onBeforeSaveItemFile
In the onBeforeSaveItemFile function, you can get the Buffer type file, you can process the Buffer, and then you need to return a Promise, and the resolve is Buffer.
In the onBeforeSaveItemFile function, you can get the Buffer type file, you can process the Buffer, and then you need to return a Promise, and the resolve is a Buffer, which will replace the obtained Buffer and store it in the file.
**Resize picture**
**Resize Picture**

@@ -423,3 +432,3 @@ Use the sharp library to resize the images to be crawled:

### Start polling
### Start Polling

@@ -444,3 +453,3 @@ Start a polling crawl with [startPolling()](#startPolling) .

**Using crawlPage in polling Note:** The purpose of calling page.close() is to prevent the browser instance from retaining references to the page instance. If the current page is no longer used in the future, it needs to be closed by itself, otherwise it will cause a memory leak.
**Using crawlPage in polling Note:** The browser instance will retain a reference to the page instance. If it is no longer used in the future, you need to close the page instance yourself, otherwise it will cause a memory leak.

@@ -452,3 +461,3 @@ Callback function parameters:

### Config priority
### Config Priority

@@ -488,5 +497,5 @@ Some common configurations can be set in these three places:

### Device fingerprint
### Custom Device Fingerprint
Customize the configuration to avoid fingerprinting and tracking us from different locations.
Customize the configuration of device fingerprints to avoid identifying and tracking us from different locations through fingerprint recognition.

@@ -511,5 +520,5 @@ Multiple information can be passed in the fingerprint through advanced usage, and internally it will help you randomly assign each target to targets. It is also possible to set a specific fingerprint for a target directly with the detailed target configuration.

maxWidth: 1980,
minWidth: 1980,
minWidth: 1200,
maxHeight: 1080,
minHidth: 1080,
minHidth: 800,
platform: 'Android'

@@ -536,9 +545,9 @@ }

In the above example, the interval time is set in both **Application Instance Configuration** and **Advanced Configuration**, then the interval time of **Advanced Configuration** will prevail. If the viewport is set in **Advanced Configuration** and **Detailed Target Configuration**, then the second target is to set the viewport, which will be based on the viewport of **Detailed Target Configuration**.
In the above example, the interval time is set in both **Application Instance Configuration** and **Advanced Configuration**, then the interval time of **Advanced Configuration** will prevail. If the viewport is set in **Advanced Configuration** and **Detailed Target Configuration**, then the second target will be based on the viewport of its **Detailed Target Configuration**.
### Interval time
### Interval Time
The interval time can prevent too much concurrency and avoid too much pressure on the server.
The crawling interval is controlled internally by the instance method, not the entire crawling interval is controlled by the instance.
The crawling interval is controlled by the crawling API itself, not by the crawler instance.

@@ -560,10 +569,10 @@ ```js

- number: The time that must wait before each request is fixed
- Object: Randomly select a value from max and min, which is more anthropomorphic
- number: The time that must wait before each crawl target is fixed
- IntervalTime: Take a random value among max and min
**Note:** The first request will not trigger the interval.
**Note:** The first crawl target will not trigger the interval.
### Fail retry
### Fail Retry
Failed retry In the event of an error such as a timeout, the request will wait for the round to end and then retry.
It can avoid crawling failure due to temporary problems, and will wait for the end of this round of crawling targets to crawl again.

@@ -576,3 +585,3 @@ ```js

myXCrawl
.crawlData({ url: 'https://www.example.com/api', maxRetry: 1 })
.crawlData({ url: 'https://www.example.com/api', maxRetry: 9 })
.then((res) => {})

@@ -583,5 +592,5 @@ ```

### Priority queue
### Priority Queue
A priority queue allows a request to be sent first.
A priority queue allows a crawl target to be sent first.

@@ -604,8 +613,15 @@ ```js

### About results
### About Results
For the result, the result of each request is uniformly wrapped with an object that provides information about the result of the request, such as id, result, success or not, maximum retry, number of retries, error information collected, and so on. Automatically determine whether the return value is wrapped in an array depending on the configuration you choose, and the type fits perfectly in TS.
Each crawl target will generate a detail object, which will contain the following properties:
The id of each object is determined according to the order of requests in your configuration, and if there is a priority used, it will be sorted by priority.
- id: Generated according to the order of crawling targets, if there is a priority, it will be generated according to the priority
- isSuccess: Whether to crawl successfully
- maxRetry: The maximum number of retries for this crawling target
- retryCount: The number of times the crawling target has been retried
- crawlErrorQueue: Error collection of the crawl target
- data: the crawling data of the crawling target
If it is a specific configuration, it will automatically determine whether the details object is stored in an array according to the configuration method you choose, and return the array, otherwise return the details object. Already fits types perfectly in TypeScript.
Details about configuration methods and results are as follows: [crawlPage config](#config), [crawlData config](#config-1), [crawlFile config](#config-2).

@@ -623,3 +639,3 @@

Create a crawler instance via call xCrawl. The request queue is maintained by the instance method itself, not by the instance itself.
Create a crawler instance via call xCrawl. The crawl target queue is maintained by the instance method itself, not by the instance itself.

@@ -718,8 +734,8 @@ #### Type

- string
- CrawlPageDetailTargetConfig
- (string | CrawlPageDetailTargetConfig)[]
- CrawlPageAdvancedConfig
- Simple target config - string
- Detailed target config - CrawlPageDetailTargetConfig
- Mixed target array config - (string | CrawlPageDetailTargetConfig)[]
- Advanced config - CrawlPageAdvancedConfig
**1.string**
##### Simple target config - string

@@ -738,3 +754,3 @@ This is a simple target configuration. if you just want to simply crawl this page, you can try this way of writing:

**2. CrawlPageDetailTargetConfig**
##### Detailed target config - CrawlPageDetailTargetConfig

@@ -761,3 +777,3 @@ This is the detailed target configuration. if you want to crawl this page and need to retry on failure, you can try this way of writing:

**3.(string | CrawlPageDetailTargetConfig)[]**
##### Mixed target array config - (string | CrawlPageDetailTargetConfig)[]

@@ -783,5 +799,5 @@ This is a mixed target array configuration. if you want to crawl multiple pages, and some pages need to fail and retry, you can try this way of writing:

**4. CrawlPageAdvancedConfig**
##### Advanced config - CrawlPageAdvancedConfig
This is an advanced configuration, targets is a mixed target array configuration. if you want to crawl multiple pages, and the request configuration (proxy, cookies, retry, etc.) does not want to be written repeatedly, if you need an interval, you can try this way of writing:
This is an advanced configuration, targets is a mixed target array configuration. if you want to crawl multiple pages and crawl target configurations (proxy, cookies, retries, etc.) that you don't want to write repeatedly, but also need interval time, device fingerprint, lifecycle, etc., try this:

@@ -879,8 +895,8 @@ ```js

- string
- CrawlDataDetailTargetConfig
- (string | CrawlDataDetailTargetConfig)[]
- CrawlDataAdvancedConfig<T>
- Simple target config - string
- Detailed target config - CrawlDataDetailTargetConfig
- Mixed target array config - (string | CrawlDataDetailTargetConfig)[]
- Advanced config - CrawlDataAdvancedConfig
**1.string**
##### Simple target config - string

@@ -899,3 +915,3 @@ This is a simple target configuration. if you just want to simply crawl the data, and the interface is GET, you can try this way of writing:

**2. CrawlDataDetailTargetConfig**
##### Detailed target config - CrawlDataDetailTargetConfig

@@ -922,3 +938,3 @@ This is the detailed target configuration. if you want to crawl this data and need to retry on failure, you can try this way of writing:

**3.(string | CrawlDataDetailTargetConfig)[]**
##### Mixed target array config - (string | CrawlDataDetailTargetConfig)[]

@@ -944,5 +960,5 @@ This is a mixed target array configuration. if you want to crawl multiple data, and some data needs to fail and retry, you can try this way of writing:

**4.CrawlDataAdvancedConfig**
##### Advanced config - CrawlDataAdvancedConfig
This is an advanced configuration, targets is a mixed target array configuration. if you want to crawl multiple data, and the request configuration (proxy, cookies, retry, etc.) does not want to be written repeatedly, if you need an interval, you can try this writing method:
This is an advanced configuration, targets is a mixed target array configuration. if you want to crawl more than one piece of data and crawl target configurations (proxy, cookies, retries, etc.) don't want to write twice, but also need interval time, device fingerprint, lifecycle, etc., try this:

@@ -1037,9 +1053,8 @@ ```js

- CrawlFileDetailTargetConfig
- Detailed target config - CrawlFileDetailTargetConfig
- Detailed target array config - CrawlFileDetailTargetConfig[]
- Advanced config CrawlFileAdvancedConfig
- CrawlFileDetailTargetConfig[]
- CrawlFileAdvancedConfig
##### Detailed target config - CrawlFileDetailTargetConfig
**1. CrawlFileDetailTargetConfig**
This is the detailed target configuration. if you want to crawl this file and need to retry on failure, you can try this way of writing:

@@ -1067,3 +1082,3 @@

**2. CrawlFileDetailTargetConfig[]**
##### Detailed target array config - CrawlFileDetailTargetConfig[]

@@ -1089,5 +1104,5 @@ This is the detailed target array configuration. if you want to crawl multiple files, and some data needs to be retried after failure, you can try this way of writing:

**3. CrawlFileAdvancedConfig**
##### Advanced config CrawlFileAdvancedConfig
This is an advanced configuration, targets is a mixed target array configuration. if you want to crawl multiple data, and the request configuration (storeDir, proxy, retry, etc.) does not want to be written repeatedly, and you need interval time, etc., you can try this way of writing:
This is an advanced configuration, targets is a mixed target array configuration. if you want to crawl more than one piece of data and crawl target configurations (proxy, storeDir, retry, etc.) don't want to write twice, but also need interval time, device fingerprint, life cycle, etc., try this:

@@ -1168,2 +1183,10 @@ ```js

**Default Value**
- mode: 'async'
- enableRandomFingerprint: true
- baseUrl: undefined
- intervalTime: undefined
- crawlPage: undefined
#### Detail target config

@@ -1191,2 +1214,11 @@

**Default Value**
- url: undefined
- headers: undefined
- cookies: undefined
- priority: undefined
- viewport: undefined
- fingerprint: undefined
##### CrawlDataDetailTargetConfig

@@ -1206,2 +1238,12 @@

**Default Value**
- url: undefined
- method: 'GET'
- headers: undefined
- params: undefined
- data: undefined
- priority: undefined
- fingerprint: undefined
##### CrawlFileDetailTargetConfig

@@ -1221,2 +1263,12 @@

**Default Value**
- url: undefined
- headers: undefined
- priority: undefined
- storeDir: \_\_dirname
- fileName: string
- extension: string
- fingerprint: undefined
#### Advanced config

@@ -1245,2 +1297,13 @@

**Default Value**
- targets: undefined
- intervalTime: undefined
- fingerprint: undefined
- headers: undefined
- cookies: undefined
- viewport: undefined
- onCrawlItemComplete: undefined
##### CrawlDataAdvancedConfig

@@ -1260,2 +1323,10 @@

**Default Value**
- targets: undefined
- intervalTime: undefined
- fingerprint: undefined
- headers: undefined
- onCrawlItemComplete: undefined
##### CrawlFileAdvancedConfig

@@ -1283,2 +1354,13 @@

**Default Value**
- targets: undefined
- intervalTime: undefined
- fingerprint: undefined
- headers: undefined
- storeDir: \_\_dirname
- extension: string
- onCrawlItemComplete: undefined
- onBeforeSaveItemFile: undefined
#### StartPollingConfig

@@ -1294,2 +1376,8 @@

**Default Value**
- d: undefined
- h: undefined
- m: undefined
#### Crawl other config

@@ -1307,2 +1395,8 @@

**Default Value**
- timeout: 10000
- proxy: undefined
- maxRetry: 0
##### DetailTargetFingerprintCommon

@@ -1321,2 +1415,11 @@

**Default Value**
- userAgent: undefined
- ua: undefined
- platform: undefined
- platformVersion: undefined
- mobile: undefined
- acceptLanguage: undefined
##### AdvancedFingerprintCommon

@@ -1335,2 +1438,11 @@

**Default Value**
- userAgents: undefined
- uas: undefined
- platforms: undefined
- platformVersions: undefined
- mobiles: undefined
- acceptLanguages: undefined
##### Mobile

@@ -1483,2 +1595,8 @@

- id: Generated according to the order of crawling targets, if there is a priority, it will be generated according to the priority
- isSuccess: Whether to crawl successfully
- maxRetry: The maximum number of retries for this crawling target
- retryCount: The number of times the crawling target has been retried
- crawlErrorQueue: Error collection of the crawl target
#### CrawlPageSingleRes

@@ -1485,0 +1603,0 @@

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc