@applitools/dom-capture
Advanced tools
Comparing version 6.1.5 to 7.0.1
10
index.js
'use strict'; | ||
const processPage = require('./src/browser/visual-grid/processPage'); | ||
const processPageAndSerialize = require('./src/browser/visual-grid/processPageAndSerialize'); | ||
const makeGetScript = require('./src/getScript'); | ||
const getCaptureDomScript = makeGetScript('captureDom'); | ||
const getProcessPageScript = makeGetScript('processPage'); | ||
const getProcessPageAndSerializeScript = makeGetScript('processPageAndSerialize'); | ||
module.exports = { | ||
processPage, | ||
processPageAndSerialize, | ||
getCaptureDomScript, | ||
getProcessPageScript, | ||
getProcessPageAndSerializeScript, | ||
}; |
{ | ||
"name": "@applitools/dom-capture", | ||
"version": "6.1.5", | ||
"version": "7.0.1", | ||
"main": "index.js", | ||
@@ -29,3 +29,3 @@ "license": "MIT", | ||
"eslint-plugin-mocha-no-only": "1.0.0", | ||
"eslint-plugin-node": "^8.0.0", | ||
"eslint-plugin-node": "^8.0.1", | ||
"eslint-plugin-prettier": "3.0.0", | ||
@@ -32,0 +32,0 @@ "express": "^4.16.4", |
116
README.md
# dom-capture | ||
Library for scripts that run in the browser and extract information from web pages. | ||
Script for getting a representation of the DOM in JSON format with information on position and computed style for each element. | ||
@@ -11,33 +11,107 @@ ## Installing | ||
## Using the package | ||
## Usage | ||
This package exports 2 types of functions: | ||
### From Node.js | ||
1. Functions that can be used when working with puppeteer, CDP or Selenium in Node.js: | ||
- `getProcessPageScript` | ||
- `getProcessPageAndSerializeScript` | ||
- `getCaptureDomScript` | ||
This package exports the `getCaptureDomScript` that can be used when working with puppeteer, CDP or Selenium in Node.js. | ||
These async functions return a string with a function that can be sent to the browser for evaluation. It doesn't immediately invoke the function, so the sender should wrap it as an IIFE. For example: | ||
This async function returns a string with a function that can be sent to the browser for evaluation. It doesn't immediately invoke the function, so the sender should wrap it as an IIFE. For example: | ||
```js | ||
const {getProcessPageScript} = require('@applitools/dom-capture'); | ||
const processPageScript = await getProcessPageScript(); | ||
const returnValue = await page.evaluate(`(${processPageScript})()`); // puppeteer | ||
``` | ||
```js | ||
const {getCaptureDomScript} = require('@applitools/dom-capture'); | ||
const captureDomScript = await getCaptureDomScript(); | ||
const returnValue = await page.evaluate(`(${captureDomScript})()`); // puppeteer | ||
``` | ||
2. The **non bundled** version of the scripts: | ||
- `processPage` | ||
- `processPageAndSerialize` | ||
### From the browser | ||
These functions can then be bundled together with other client-side code so they are consumed regardless of a browser driver (this is how the Eyes.Cypress SDK uses it). | ||
By using the **non bundled** version of the script: `src/browser/captureFrame`. | ||
### Usage from non-JavaScript code | ||
This function can then be bundled together with other client-side code so they are consumed regardless of a browser driver. | ||
This package's `dist` folder contains scripts that can be sent to the browser regradless of driver and language. An agent that wishes to extract information from a webpage can read the contents of `dist/processPageAndSerialize` and send that to the browser as an async script. **There's still the need to wrap it in a way that invokes it**. | ||
### From non-JavaScript code | ||
This package's `dist` folder contains a script that can be sent to the browser regradless of driver and language. An agent that wishes to extract information from a webpage can read the contents of `dist/captureDom` and send that to the browser as an async script. **There's still the need to wrap it in a way that invokes it**. | ||
For example in `Java`: | ||
```java | ||
Object response = driver.executeAsyncScript("const callback = arguments[arguments.length - 1];(" + processPageAndSerialize + ")().then(callback, err => callback(err.message))"; | ||
``` | ||
Object response = driver.executeAsyncScript("const callback = arguments[arguments.length - 1];(" + captureDom + ")().then(callback, err => callback(err.message))"; | ||
``` | ||
## The `captureDom` script | ||
This script receives information about what should be captured, and a document from which to capture the information. The first argument is an object with the following properties: `{styleProps, rectProps, ignoredTagNames}`: | ||
- `styleProps` - an array containing the css properties that should be captured for computed style. E.g. `['background']`. | ||
- `rectProps` - an array containig the bounding client rect properties that should be captured. E.g. `['top', 'left']`. | ||
- `ignoredTagNames` - an array containing tag names that should not be captured. E.g. `['head']`. | ||
The script returns an object representing the DOM in hierarchical structure (as opposed to the flat structure of CDT), with computed style and bounding client rect information for each element. | ||
Each element has the following properties: | ||
- `tagName` | ||
- `style` | ||
- `rect` | ||
- `attributes` | ||
- `childNodes` | ||
Text nodes have the following properties: | ||
- `tagName` - always `#text`. | ||
- `text` - the text of the text node. | ||
In addition, in the object representing the `HTML` element there are 2 other special properties: | ||
- `css` - the bundled css string for all the css in this frame (including style tags, link elements and css imports). | ||
- `images` - image size information for all the images included as background image. The structure is as follows: | ||
```js | ||
{ | ||
"http://some/image.jpg": {width, height} | ||
} | ||
``` | ||
The return value is a **string** that consists of a prefix specifying unfetched css resources and iframes, followed by the actual DOM structure. | ||
For example: | ||
```js | ||
{"separator": "-----", "cssToken": "#####", "iframeToken": "@@@@@"} | ||
http://url/to/css/1 | ||
http://url/to/css/2 | ||
http://url/to/css/3 | ||
----- | ||
html[1]/body[1]/iframe[2],html[1]/body[1]/iframe[1] | ||
html[1]/body[1]/div[10]/div[3]/iframe[2],html[1]/body[1]/div[4]/iframe[6] | ||
----- | ||
{"tagName":"HTML","style":{...},"rect":{...},"childNodes":[ | ||
{"tagName":"BODY","style":{...},"rect":{...},"childNodes":[ | ||
{"tagName":"DIV","style":{...},"rect":{...},"childNodes":[ | ||
{"tagName":"#text","text":"hello"}]}, | ||
{"tagName":"IFRAME","style":{...},"rect":{...},"attributes":{"src":"some/url.html"},"childNodes":[ | ||
{"tagName":"HTML","style":{...},"rect":{...},"childNodes":[ | ||
{"tagName":"BODY","style":{...},"rect":{...},"childNodes":[ | ||
{"tagName":"IFRAME","style":{...},"rect":{...},"attributes":{"src":"http://localhost:7272/iframe.html","width":"200","height":"100"},"childNodes":["@@@@@html[1]/body[1]/iframe[2],html[1]/body[1]/iframe[1]@@@@@}]}"], | ||
"css":"","images":{}}]}]}], | ||
"css":`/** http://some/url.css **/ | ||
div{border: 5px solid salmon;} | ||
/** http://url/to/css/1 **/ | ||
#####http://url/to/css/1##### | ||
/** http://url/to/css/2 **/ | ||
#####http://url/to/css/2##### | ||
/** http://url/to/css/3 **/ | ||
#####http://url/to/css/3#####`, | ||
"images":{}} | ||
``` | ||
The first line should be parsed as a JSON and its properties serve to parse the rest of the string. | ||
The following lines up to the next separator are urls to cross-origin css resources. | ||
The following lines up to the next separator are comma-separated lists of xpath expressions that uniquely identify iframes (iframe per line). | ||
after the following separator is the JSON structure that was captured from the DOM. | ||
Notice how every css resource in the prefix has a corresponding token of the structure `#####url#####`, and every cross-origin iframe in the prefix has a corresponding token of the structure `"@@@@@path@@@@@"`. | ||
In order to complete the process of capturing the DOM, the SDK (or other code using this script) should fetch all the css resources, run `JSON.stringify` on the result of each css (this is important for escaping), then replace the token with the escaped css string. | ||
In addition, for each cross-origin iframe the `captureDom` script should be run again in the context of the frame, and the same process should | ||
be done recursively. When finalizing the result of a frame, it should then be injected to its parent's result in the corresponding token. |
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
Major refactor
Supply chain riskPackage has recently undergone a major refactor. It may be unstable or indicate significant internal changes. Use caution when updating to versions that include significant changes.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
116
5
32194
18
771