Comparing version 5.0.4 to 5.0.5
@@ -14,3 +14,3 @@ { | ||
"license": "MIT", | ||
"version": "5.0.4", | ||
"version": "5.0.5", | ||
"main": "lib/index.js", | ||
@@ -34,3 +34,37 @@ "types": "lib/index.d.ts", | ||
"blah": { | ||
"h_img": "https://i.imgur.com/j3Z0rbN.png" | ||
"h_img": "https://i.imgur.com/j3Z0rbN.png", | ||
"cli": "scrape-it-cli", | ||
"installation": [ | ||
{ | ||
"h2": "FAQ" | ||
}, | ||
{ | ||
"p": "Here are some frequent questions and their answers." | ||
}, | ||
{ | ||
"h3": "1. How to parse scrape pages?" | ||
}, | ||
{ | ||
"p": "`scrape-it` has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios:" | ||
}, | ||
{ | ||
"ol": [ | ||
"**The ajax response is in JSON format.** In this case, you can make the request directly, without needing a scraping library.", | ||
"**The ajax response gives you HTML back.** Instead of calling the main website (e.g. example.com), pass to `scrape-it` the ajax url (e.g. `example.com/api/that-endpoint`) and you will you will be able to parse the response", | ||
"**The ajax request is so complicated that you don't want to reverse-engineer it.** In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the `.scrapeHTML` method from scrape it once you get the HTML loaded on the page." | ||
] | ||
}, | ||
{ | ||
"h3": "2. Crawling" | ||
}, | ||
{ | ||
"p": "There is no fancy way to crawl pages with `scrape-it`. For simple scenarios, you can parse the list of urls from the initial page and then, using Promises, parse each page. Also, you can use a different crawler to download the website and then use the `.scrapeHTML` method to scrape the local files." | ||
}, | ||
{ | ||
"h3": "3. Local files" | ||
}, | ||
{ | ||
"p": "Use the `.scrapeHTML` to parse the HTML read from the local files using `fs.readFile`." | ||
} | ||
] | ||
}, | ||
@@ -37,0 +71,0 @@ "dependencies": { |
@@ -23,2 +23,30 @@ <!-- Please do not edit this file. Edit the `blah` field in the `package.json` instead. If in doubt, open an issue. --> | ||
:bulb: **ProTip**: You can install the [cli version of this module](http://github.com/IonicaBizau/scrape-it-cli) by running `npm install --global scrape-it-cli` (or `yarn global add scrape-it-cli`). | ||
## FAQ | ||
Here are some frequent questions and their answers. | ||
### 1. How to parse scrape pages? | ||
`scrape-it` has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios: | ||
1. **The ajax response is in JSON format.** In this case, you can make the request directly, without needing a scraping library. | ||
2. **The ajax response gives you HTML back.** Instead of calling the main website (e.g. example.com), pass to `scrape-it` the ajax url (e.g. `example.com/api/that-endpoint`) and you will you will be able to parse the response | ||
3. **The ajax request is so complicated that you don't want to reverse-engineer it.** In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the `.scrapeHTML` method from scrape it once you get the HTML loaded on the page. | ||
### 2. Crawling | ||
There is no fancy way to crawl pages with `scrape-it`. For simple scenarios, you can parse the list of urls from the initial page and then, using Promises, parse each page. Also, you can use a different crawler to download the website and then use the `.scrapeHTML` method to scrape the local files. | ||
### 3. Local files | ||
Use the `.scrapeHTML` to parse the HTML read from the local files using `fs.readFile`. | ||
## :clipboard: Example | ||
@@ -285,2 +313,3 @@ | ||
- [`sahibindenServer`](https://npmjs.com/package/sahibindenServer) (by Cagatay Cali)—Simple sahibinden.com bot server side | ||
- [`scrape-it-cli`](https://github.com/IonicaBizau/scrape-it-cli#readme)—CLI for scrape-it. A Node.js scraper for humans. :rocket: | ||
- [`scrape-vinmonopolet`](https://npmjs.com/package/scrape-vinmonopolet)— | ||
@@ -287,0 +316,0 @@ - [`selfrefactor`](https://github.com/selfrefactor/selfrefactor#readme) (by selfrefactor)—common functions used by I Learn Smarter project |
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
28075
340