scrape-it - npm Package Compare versions

Comparing version 5.0.4 to 5.0.5

package.json

		@@ -14,3 +14,3 @@ {
		"license": "MIT",
		"version": "5.0.4",
		"version": "5.0.5",
		"main": "lib/index.js",
		@@ -34,3 +34,37 @@ "types": "lib/index.d.ts",
		"blah": {
		"h_img": "https://i.imgur.com/j3Z0rbN.png"
		"h_img": "https://i.imgur.com/j3Z0rbN.png",
		"cli": "scrape-it-cli",
		"installation": [
		{
		"h2": "FAQ"
		},
		{
		"p": "Here are some frequent questions and their answers."
		},
		{
		"h3": "1. How to parse scrape pages?"
		},
		{
		"p": "`scrape-it` has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios:"
		},
		{
		"ol": [
		"The ajax response is in JSON format. In this case, you can make the request directly, without needing a scraping library.",
		"The ajax response gives you HTML back. Instead of calling the main website (e.g. example.com), pass to `scrape-it` the ajax url (e.g. `example.com/api/that-endpoint`) and you will you will be able to parse the response",
		"The ajax request is so complicated that you don't want to reverse-engineer it. In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the `.scrapeHTML` method from scrape it once you get the HTML loaded on the page."
		]
		},
		{
		"h3": "2. Crawling"
		},
		{
		"p": "There is no fancy way to crawl pages with `scrape-it`. For simple scenarios, you can parse the list of urls from the initial page and then, using Promises, parse each page. Also, you can use a different crawler to download the website and then use the `.scrapeHTML` method to scrape the local files."
		},
		{
		"h3": "3. Local files"
		},
		{
		"p": "Use the `.scrapeHTML` to parse the HTML read from the local files using `fs.readFile`."
		}
		]
		},
		@@ -37,0 +71,0 @@ "dependencies": {

README.md

		@@ -23,2 +23,30 @@ <!-- Please do not edit this file. Edit the `blah` field in the `package.json` instead. If in doubt, open an issue. -->

		:bulb: ProTip: You can install the [cli version of this module](http://github.com/IonicaBizau/scrape-it-cli) by running `npm install --global scrape-it-cli` (or `yarn global add scrape-it-cli`).

		## FAQ


		Here are some frequent questions and their answers.

		### 1. How to parse scrape pages?


		`scrape-it` has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios:


		1. The ajax response is in JSON format. In this case, you can make the request directly, without needing a scraping library.
		2. The ajax response gives you HTML back. Instead of calling the main website (e.g. example.com), pass to `scrape-it` the ajax url (e.g. `example.com/api/that-endpoint`) and you will you will be able to parse the response
		3. The ajax request is so complicated that you don't want to reverse-engineer it. In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the `.scrapeHTML` method from scrape it once you get the HTML loaded on the page.

		### 2. Crawling


		There is no fancy way to crawl pages with `scrape-it`. For simple scenarios, you can parse the list of urls from the initial page and then, using Promises, parse each page. Also, you can use a different crawler to download the website and then use the `.scrapeHTML` method to scrape the local files.

		### 3. Local files


		Use the `.scrapeHTML` to parse the HTML read from the local files using `fs.readFile`.


		## :clipboard: Example
		@@ -285,2 +313,3 @@
		- [`sahibindenServer`](https://npmjs.com/package/sahibindenServer) (by Cagatay Cali)—Simple sahibinden.com bot server side
		- [`scrape-it-cli`](https://github.com/IonicaBizau/scrape-it-cli#readme)—CLI for scrape-it. A Node.js scraper for humans. :rocket:
		- [`scrape-vinmonopolet`](https://npmjs.com/package/scrape-vinmonopolet)—
		@@ -287,0 +316,0 @@ - [`selfrefactor`](https://github.com/selfrefactor/selfrefactor#readme) (by selfrefactor)—common functions used by I Learn Smarter project

scrape-it - npm Package Compare versions

New alerts

Fixed alerts

Improved metrics