Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

scrape-it

Package Overview
Dependencies
Maintainers
1
Versions
48
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

scrape-it - npm Package Compare versions

Comparing version 5.0.4 to 5.0.5

38

package.json

@@ -14,3 +14,3 @@ {

"license": "MIT",
"version": "5.0.4",
"version": "5.0.5",
"main": "lib/index.js",

@@ -34,3 +34,37 @@ "types": "lib/index.d.ts",

"blah": {
"h_img": "https://i.imgur.com/j3Z0rbN.png"
"h_img": "https://i.imgur.com/j3Z0rbN.png",
"cli": "scrape-it-cli",
"installation": [
{
"h2": "FAQ"
},
{
"p": "Here are some frequent questions and their answers."
},
{
"h3": "1. How to parse scrape pages?"
},
{
"p": "`scrape-it` has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios:"
},
{
"ol": [
"**The ajax response is in JSON format.** In this case, you can make the request directly, without needing a scraping library.",
"**The ajax response gives you HTML back.** Instead of calling the main website (e.g. example.com), pass to `scrape-it` the ajax url (e.g. `example.com/api/that-endpoint`) and you will you will be able to parse the response",
"**The ajax request is so complicated that you don't want to reverse-engineer it.** In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the `.scrapeHTML` method from scrape it once you get the HTML loaded on the page."
]
},
{
"h3": "2. Crawling"
},
{
"p": "There is no fancy way to crawl pages with `scrape-it`. For simple scenarios, you can parse the list of urls from the initial page and then, using Promises, parse each page. Also, you can use a different crawler to download the website and then use the `.scrapeHTML` method to scrape the local files."
},
{
"h3": "3. Local files"
},
{
"p": "Use the `.scrapeHTML` to parse the HTML read from the local files using `fs.readFile`."
}
]
},

@@ -37,0 +71,0 @@ "dependencies": {

@@ -23,2 +23,30 @@ <!-- Please do not edit this file. Edit the `blah` field in the `package.json` instead. If in doubt, open an issue. -->

:bulb: **ProTip**: You can install the [cli version of this module](http://github.com/IonicaBizau/scrape-it-cli) by running `npm install --global scrape-it-cli` (or `yarn global add scrape-it-cli`).
## FAQ
Here are some frequent questions and their answers.
### 1. How to parse scrape pages?
`scrape-it` has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios:
1. **The ajax response is in JSON format.** In this case, you can make the request directly, without needing a scraping library.
2. **The ajax response gives you HTML back.** Instead of calling the main website (e.g. example.com), pass to `scrape-it` the ajax url (e.g. `example.com/api/that-endpoint`) and you will you will be able to parse the response
3. **The ajax request is so complicated that you don't want to reverse-engineer it.** In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the `.scrapeHTML` method from scrape it once you get the HTML loaded on the page.
### 2. Crawling
There is no fancy way to crawl pages with `scrape-it`. For simple scenarios, you can parse the list of urls from the initial page and then, using Promises, parse each page. Also, you can use a different crawler to download the website and then use the `.scrapeHTML` method to scrape the local files.
### 3. Local files
Use the `.scrapeHTML` to parse the HTML read from the local files using `fs.readFile`.
## :clipboard: Example

@@ -285,2 +313,3 @@

- [`sahibindenServer`](https://npmjs.com/package/sahibindenServer) (by Cagatay Cali)—Simple sahibinden.com bot server side
- [`scrape-it-cli`](https://github.com/IonicaBizau/scrape-it-cli#readme)—CLI for scrape-it. A Node.js scraper for humans. :rocket:
- [`scrape-vinmonopolet`](https://npmjs.com/package/scrape-vinmonopolet)—

@@ -287,0 +316,0 @@ - [`selfrefactor`](https://github.com/selfrefactor/selfrefactor#readme) (by selfrefactor)—common functions used by I Learn Smarter project

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc