Comparing version 2.2.0 to 2.2.1
{ | ||
"name": "x-crawl", | ||
"version": "2.2.0", | ||
"version": "2.2.1", | ||
"author": "coderHXL", | ||
@@ -5,0 +5,0 @@ "description": "XCrawl is a Nodejs multifunctional crawler library.", |
@@ -7,14 +7,18 @@ # x-crawl | ||
## Feature | ||
## Features | ||
- Crawl HTML, JSON, file resources, etc. with simple configuration. | ||
- Built-in puppeteer crawls HTML and uses JSDOM library to parse HTML. | ||
- Crawl pages, JSON, file resources, etc. with simple configuration. | ||
- The built-in puppeteer crawls the page, and uses the jsdom library to parse the page. | ||
- Support asynchronous/synchronous way to crawl data. | ||
- Support Promise/Callback way to get the result. | ||
- Polling function. | ||
- Support Promise/Callback method to get the result. | ||
- Polling function, fixed-point crawling. | ||
- Anthropomorphic request interval. | ||
- Written in TypeScript, provides generics. | ||
- Written in TypeScript, providing generics. | ||
## Benefits provided by using puppeter | ||
## Relationship with puppeter | ||
The fetchHTML API internally uses the [puppeter](https://github.com/puppeteer/puppeteer) library to crawl pages. | ||
The following can be done: | ||
- Generate screenshots and PDFs of pages. | ||
@@ -37,2 +41,3 @@ - Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)). | ||
+ [Example](#Example-2) | ||
+ [About page](#About-page) | ||
* [fetchData](#fetchData) | ||
@@ -178,3 +183,3 @@ + [Type](#Type-3) | ||
fetchHTML is the method of the above [myXCrawl](https://github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl HTML. | ||
fetchHTML is the method of the above [myXCrawl](https://github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl page. | ||
@@ -184,3 +189,3 @@ #### Type | ||
- Look at the [FetchHTMLConfig](#FetchHTMLConfig) type | ||
- Look at the [FetchHTML](#FetchHTML) type | ||
- Look at the [FetchHTML](#FetchHTML-2) type | ||
@@ -203,2 +208,6 @@ ```ts | ||
#### About page | ||
Get the page instance from res.data.page, which can do interactive operations such as events. For specific usage, refer to [page](https://pptr.dev/api/puppeteer.page). | ||
### fetchData | ||
@@ -232,3 +241,3 @@ | ||
requestConfig, // Request configuration, can be RequestConfig | RequestConfig[] | ||
intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when not using myXCrawl | ||
intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when creating myXCrawl is not used | ||
}).then(res => { | ||
@@ -389,3 +398,3 @@ console.log(res) | ||
storeDir: string // Store folder | ||
extension?: string // filename extension | ||
extension?: string // Filename extension | ||
} | ||
@@ -419,3 +428,3 @@ } | ||
```ts | ||
type FetchCommonArr<T> = FetchCommon<T>[] | ||
type FetchResCommonArrV1<T> = FetchResCommonV1<T>[] | ||
``` | ||
@@ -422,0 +431,0 @@ |
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
99533
450