Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

x-crawl

Package Overview
Dependencies
Maintainers
1
Versions
66
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

x-crawl - npm Package Compare versions

Comparing version 2.2.0 to 2.2.1

2

package.json
{
"name": "x-crawl",
"version": "2.2.0",
"version": "2.2.1",
"author": "coderHXL",

@@ -5,0 +5,0 @@ "description": "XCrawl is a Nodejs multifunctional crawler library.",

@@ -7,14 +7,18 @@ # x-crawl

## Feature
## Features
- Crawl HTML, JSON, file resources, etc. with simple configuration.
- Built-in puppeteer crawls HTML and uses JSDOM library to parse HTML.
- Crawl pages, JSON, file resources, etc. with simple configuration.
- The built-in puppeteer crawls the page, and uses the jsdom library to parse the page.
- Support asynchronous/synchronous way to crawl data.
- Support Promise/Callback way to get the result.
- Polling function.
- Support Promise/Callback method to get the result.
- Polling function, fixed-point crawling.
- Anthropomorphic request interval.
- Written in TypeScript, provides generics.
- Written in TypeScript, providing generics.
## Benefits provided by using puppeter
## Relationship with puppeter
The fetchHTML API internally uses the [puppeter](https://github.com/puppeteer/puppeteer) library to crawl pages.
The following can be done:
- Generate screenshots and PDFs of pages.

@@ -37,2 +41,3 @@ - Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).

+ [Example](#Example-2)
+ [About page](#About-page)
* [fetchData](#fetchData)

@@ -178,3 +183,3 @@ + [Type](#Type-3)

fetchHTML is the method of the above [myXCrawl](https://github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl HTML.
fetchHTML is the method of the above [myXCrawl](https://github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl page.

@@ -184,3 +189,3 @@ #### Type

- Look at the [FetchHTMLConfig](#FetchHTMLConfig) type
- Look at the [FetchHTML](#FetchHTML) type
- Look at the [FetchHTML](#FetchHTML-2) type

@@ -203,2 +208,6 @@ ```ts

#### About page
Get the page instance from res.data.page, which can do interactive operations such as events. For specific usage, refer to [page](https://pptr.dev/api/puppeteer.page).
### fetchData

@@ -232,3 +241,3 @@

requestConfig, // Request configuration, can be RequestConfig | RequestConfig[]
intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when not using myXCrawl
intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when creating myXCrawl is not used
}).then(res => {

@@ -389,3 +398,3 @@ console.log(res)

storeDir: string // Store folder
extension?: string // filename extension
extension?: string // Filename extension
}

@@ -419,3 +428,3 @@ }

```ts
type FetchCommonArr<T> = FetchCommon<T>[]
type FetchResCommonArrV1<T> = FetchResCommonV1<T>[]
```

@@ -422,0 +431,0 @@

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc