'use strict';

		const processPage = require('./src/browser/visual-grid/processPage');
		const processPageAndSerialize = require('./src/browser/visual-grid/processPageAndSerialize');

		const makeGetScript = require('./src/getScript');

		const getCaptureDomScript = makeGetScript('captureDom');
		const getProcessPageScript = makeGetScript('processPage');
		const getProcessPageAndSerializeScript = makeGetScript('processPageAndSerialize');

		module.exports = {
		processPage,
		processPageAndSerialize,
		getCaptureDomScript,
		getProcessPageScript,
		getProcessPageAndSerializeScript,
		};

package.json

		{
		"name": "@applitools/dom-capture",
		"version": "6.1.5",
		"version": "7.0.1",
		"main": "index.js",
		@@ -29,3 +29,3 @@ "license": "MIT",
		"eslint-plugin-mocha-no-only": "1.0.0",
		"eslint-plugin-node": "^8.0.0",
		"eslint-plugin-node": "^8.0.1",
		"eslint-plugin-prettier": "3.0.0",
		@@ -32,0 +32,0 @@ "express": "^4.16.4",

116

README.md

		# dom-capture

		Library for scripts that run in the browser and extract information from web pages.
		Script for getting a representation of the DOM in JSON format with information on position and computed style for each element.

		@@ -11,33 +11,107 @@ ## Installing

		## Using the package
		## Usage

		This package exports 2 types of functions:
		### From Node.js

		1. Functions that can be used when working with puppeteer, CDP or Selenium in Node.js:
		- `getProcessPageScript`
		- `getProcessPageAndSerializeScript`
		- `getCaptureDomScript`
		This package exports the `getCaptureDomScript` that can be used when working with puppeteer, CDP or Selenium in Node.js.

		These async functions return a string with a function that can be sent to the browser for evaluation. It doesn't immediately invoke the function, so the sender should wrap it as an IIFE. For example:
		This async function returns a string with a function that can be sent to the browser for evaluation. It doesn't immediately invoke the function, so the sender should wrap it as an IIFE. For example:

		```js
		const {getProcessPageScript} = require('@applitools/dom-capture');
		const processPageScript = await getProcessPageScript();
		const returnValue = await page.evaluate(`(${processPageScript})()`); // puppeteer
		```
		```js
		const {getCaptureDomScript} = require('@applitools/dom-capture');
		const captureDomScript = await getCaptureDomScript();
		const returnValue = await page.evaluate(`(${captureDomScript})()`); // puppeteer
		```

		2. The non bundled version of the scripts:
		- `processPage`
		- `processPageAndSerialize`
		### From the browser

		These functions can then be bundled together with other client-side code so they are consumed regardless of a browser driver (this is how the Eyes.Cypress SDK uses it).
		By using the non bundled version of the script: `src/browser/captureFrame`.

		### Usage from non-JavaScript code
		This function can then be bundled together with other client-side code so they are consumed regardless of a browser driver.

		This package's `dist` folder contains scripts that can be sent to the browser regradless of driver and language. An agent that wishes to extract information from a webpage can read the contents of `dist/processPageAndSerialize` and send that to the browser as an async script. There's still the need to wrap it in a way that invokes it.
		### From non-JavaScript code

		This package's `dist` folder contains a script that can be sent to the browser regradless of driver and language. An agent that wishes to extract information from a webpage can read the contents of `dist/captureDom` and send that to the browser as an async script. There's still the need to wrap it in a way that invokes it.

		For example in `Java`:

		```java
		Object response = driver.executeAsyncScript("const callback = arguments[arguments.length - 1];(" + processPageAndSerialize + ")().then(callback, err => callback(err.message))";
		```
		Object response = driver.executeAsyncScript("const callback = arguments[arguments.length - 1];(" + captureDom + ")().then(callback, err => callback(err.message))";
		```

		## The `captureDom` script

		This script receives information about what should be captured, and a document from which to capture the information. The first argument is an object with the following properties: `{styleProps, rectProps, ignoredTagNames}`:

		- `styleProps` - an array containing the css properties that should be captured for computed style. E.g. `['background']`.
		- `rectProps` - an array containig the bounding client rect properties that should be captured. E.g. `['top', 'left']`.
		- `ignoredTagNames` - an array containing tag names that should not be captured. E.g. `['head']`.

		The script returns an object representing the DOM in hierarchical structure (as opposed to the flat structure of CDT), with computed style and bounding client rect information for each element.

		Each element has the following properties:

		- `tagName`
		- `style`
		- `rect`
		- `attributes`
		- `childNodes`

		Text nodes have the following properties:

		- `tagName` - always `#text`.
		- `text` - the text of the text node.

		In addition, in the object representing the `HTML` element there are 2 other special properties:

		- `css` - the bundled css string for all the css in this frame (including style tags, link elements and css imports).
		- `images` - image size information for all the images included as background image. The structure is as follows:

		```js
		{
		"http://some/image.jpg": {width, height}
		}
		```

		The return value is a string that consists of a prefix specifying unfetched css resources and iframes, followed by the actual DOM structure.
		For example:

		```js
		{"separator": "-----", "cssToken": "#####", "iframeToken": "@@@@@"}
		http://url/to/css/1
		http://url/to/css/2
		http://url/to/css/3
		-----
		html[1]/body[1]/iframe[2],html[1]/body[1]/iframe[1]
		html[1]/body[1]/div[10]/div[3]/iframe[2],html[1]/body[1]/div[4]/iframe[6]
		-----
		{"tagName":"HTML","style":{...},"rect":{...},"childNodes":[
		{"tagName":"BODY","style":{...},"rect":{...},"childNodes":[
		{"tagName":"DIV","style":{...},"rect":{...},"childNodes":[
		{"tagName":"#text","text":"hello"}]},
		{"tagName":"IFRAME","style":{...},"rect":{...},"attributes":{"src":"some/url.html"},"childNodes":[
		{"tagName":"HTML","style":{...},"rect":{...},"childNodes":[
		{"tagName":"BODY","style":{...},"rect":{...},"childNodes":[
		{"tagName":"IFRAME","style":{...},"rect":{...},"attributes":{"src":"http://localhost:7272/iframe.html","width":"200","height":"100"},"childNodes":["@@@@@html[1]/body[1]/iframe[2],html[1]/body[1]/iframe[1]@@@@@}]}"],
		"css":"","images":{}}]}]}],
		"css":`/ http://some/url.css /
		div{border: 5px solid salmon;}
		/ http://url/to/css/1 /
		#####http://url/to/css/1#####
		/ http://url/to/css/2 /
		#####http://url/to/css/2#####
		/ http://url/to/css/3 /
		#####http://url/to/css/3#####`,
		"images":{}}
		```

		The first line should be parsed as a JSON and its properties serve to parse the rest of the string.
		The following lines up to the next separator are urls to cross-origin css resources.
		The following lines up to the next separator are comma-separated lists of xpath expressions that uniquely identify iframes (iframe per line).
		after the following separator is the JSON structure that was captured from the DOM.

		Notice how every css resource in the prefix has a corresponding token of the structure `#####url#####`, and every cross-origin iframe in the prefix has a corresponding token of the structure `"@@@@@path@@@@@"`.

		In order to complete the process of capturing the DOM, the SDK (or other code using this script) should fetch all the css resources, run `JSON.stringify` on the result of each css (this is important for escaping), then replace the token with the escaped css string.
		In addition, for each cross-origin iframe the `captureDom` script should be run again in the context of the frame, and the same process should
		be done recursively. When finalizing the result of a frame, it should then be injected to its parent's result in the corresponding token.

dist/processPage.js

dist/processPageAndSerialize.js

dist/processResource.js

src/browser/dom-snapshot/captureFrame.js

src/browser/dom-snapshot/captureNodeCss.js

src/browser/dom-snapshot/defaultDomProps.js

src/browser/dom-snapshot/extractCssFromNode.js

src/browser/dom-snapshot/fetchCss.js

src/browser/dom-snapshot/genXpath.js

src/browser/dom-snapshot/getBackgroundImageUrl.js

src/browser/dom-snapshot/getBundledCssFromCssText.js

src/browser/dom-snapshot/getImageSizes.js

src/browser/dom-snapshot/parseCss.js

src/browser/shared/absolutizeUrl.js

src/browser/visual-grid/aggregateResourceUrlsAndBlobs.js

src/browser/visual-grid/arrayBufferToBase64.js

src/browser/visual-grid/domNodesToCdt.js

src/browser/visual-grid/extractFrames.js

src/browser/visual-grid/extractLinks.js

src/browser/visual-grid/extractResourcesFromStyleSheet.js

src/browser/visual-grid/extractResourceUrlsFromStyleAttrs.js

src/browser/visual-grid/extractResourceUrlsFromStyleTags.js

src/browser/visual-grid/fetchUrl.js

src/browser/visual-grid/filterInlineUrl.js

src/browser/visual-grid/findStyleSheetByUrl.js

src/browser/visual-grid/getResourceUrlsAndBlobs.js

src/browser/visual-grid/getUrlFromCssText.js

src/browser/visual-grid/isSameOrigin.js

src/browser/visual-grid/processPage.js

src/browser/visual-grid/processPageAndSerialize.js

src/browser/visual-grid/processResource.js

src/browser/visual-grid/splitOnOrigin.js

src/browser/visual-grid/uniq.js

@applitools/dom-capture - npm Package Compare versions

New alerts

Improved metrics

Worsened metrics