dom-snapshot
Script for extracting resources and DOM in CDT format, to serve as the input for rendering a screenshot with the visual grid.
Package Context & Usage Across Applitools SDKs
This package is a core dependency used throughout the Applitools ecosystem:
Consumer Packages
@applitools/eyes-selenium - Selenium WebDriver integration
@applitools/eyes-cypress - Cypress testing integration
@applitools/eyes-playwright - Playwright testing integration
@applitools/core - Shared core functionality across SDKs
@applitools/visual-grid-client - Visual Grid rendering service
Usage Patterns
1. Script Injection Pattern (Most Common)
Test frameworks inject DOM capture scripts into browser contexts:
const { getProcessPage } = require('@applitools/dom-snapshot')
const script = await getProcessPage()
const result = await driver.executeScript(`return (${script}).apply(null, arguments);`, args)
2. Direct Function Execution Pattern
Advanced frameworks that pass function objects directly:
const processPagePoll = require('@applitools/dom-snapshot/dist/processPagePollCjs')
await driver.executePoll({ main: processPagePoll }, args)
3. Client-Side Bundling Pattern
Non-driver based integrations bundle the functions:
import processPage from '@applitools/dom-snapshot/src/browser/processPage'
Design Architecture
Multi-Format Build Strategy
The package provides the same functionality in multiple formats to support different execution contexts:
src/browser/processPage.js
βββ dist/processPage.js (IIFE - Browser injection)
βββ dist/processPagePollCjs.js (CommonJS - Node.js execution)
Format Rationale
IIFE (Immediately Invoked Function Expression)
- Purpose: Browser script injection
- Consumers: All driver-based SDKs (Selenium, Playwright, etc.)
- Why: Self-contained, executes immediately when injected
- Example:
dist/processPage.js
CommonJS
- Purpose: Direct Node.js function execution
- Consumers: Core package, advanced driver capabilities
- Why: Can be passed as function object to specialized driver methods
- Example:
dist/processPagePollCjs.js
ES Modules
- Purpose: Client-side bundling
- Consumers: Cypress SDK, other in-browser integrations
- Why: Tree-shakable, modern bundling support
- Example:
src/browser/processPage.js (raw source)
Simplified Index Strategy
Instead of complex bundling with selective obfuscation, we use direct imports:
const { getProcessPage, getProcessPagePoll, getPollResult } = require('@applitools/dom-snapshot')
const processPagePoll = require('@applitools/dom-snapshot/dist/processPagePollCjs')
const pollResult = require('@applitools/dom-snapshot/dist/pollResultCjs')
This approach:
- β
Eliminates bundling complexity - No circular dependencies or path resolution issues
- β
Clear separation of concerns - Each format serves its specific purpose
- β
Better tree-shaking - Consumers import only what they need
- β
Simpler maintenance - No selective obfuscation configuration needed
TypeScript Migration Design
Gradual Migration Strategy:
- Coexist
.js and .ts files during transition
- Rollup handles both formats with TypeScript plugin
- Build system remains consistent across all outputs
- Consumer packages unaffected during migration
CSS Merging Design Principles
The DOM snapshot captures CSS from two sources to ensure cross-browser re-renderability:
- textContent - Raw CSS as written in
<style> tags (source code)
- CSSOM - Browser's runtime CSS Object Model (computed state)
Why Merge Both Sources?
Different browsers have varying CSS support levels. By merging both sources, we ensure:
- β
Browser-specific syntax is preserved (e.g.,
-webkit-appearance, vendor prefixes)
- β
Unsupported CSS features are captured (e.g., CSS nesting in older browsers)
- β
Runtime modifications are included (e.g., JavaScript-modified styles)
- β
Snapshots render correctly across browsers with different CSS capabilities
Design Trade-off: Re-renderability vs Runtime State
When CSS exists in textContent but not in CSSOM, we cannot distinguish between:
-
Browser limitation - CSSOM doesn't support the syntax (e.g., vendor prefixes, new CSS features)
- β
Should preserve for cross-browser re-renderability
-
JavaScript removal - JavaScript explicitly deleted the CSS at runtime
- β Ideally wouldn't preserve since it was intentionally removed
Our Choice: Prioritize Re-renderability
We choose to preserve textContent when CSSOM is missing, accepting the trade-off that we might include CSS that JavaScript intentionally removed. This decision:
- β
Maximizes cross-browser compatibility
- β
Ensures vendor-specific CSS works where supported
- β
Captures emerging CSS features not yet in CSSOM
- β May include runtime-deleted CSS (unavoidable without JS tracking)
This is consistent with how we handle vendor prefixes and other browser-specific CSS throughout the package.
Implementation Details
See src/browser/cssom/mergeRules.ts for the merging algorithm, which:
- Recursively merges CSS rules from textContent and CSSOM
- Preserves nested rules (CSS nesting with
&) from either source
- Handles style property conflicts by preferring CSSOM values
- Maintains both vendor-prefixed and normalized property versions
CSS Variable Resolution in CSSOM-only Styles
When a <style> tag has empty textContent (e.g., styles injected via CSSStyleSheet.insertRule() as used by styled-components in production mode), we fall back to the CSSOM entirely. If those rules contain shorthand properties with var() references (e.g., border-bottom: 10px dotted var(--my-color)), the browser stores empty strings for the longhands in the CSSOM (border-bottom-color: "").
To resolve this, we use document.querySelector(selector) + window.getComputedStyle() to fill the empty longhands with their actual computed values.
Known limitations of this approach:
- Pseudo-element selectors β Rules targeting
::before, ::after, ::placeholder, etc. cannot be queried with querySelector(), so empty longhands in those rules are not resolved.
- State-dependent selectors β Rules with
:hover, :focus, :active, or similar pseudo-classes are only a concern for customers using BCS (Browser-Controlled Screenshot) hooks, since DOM snapshot does not preserve interactive states.
- Multiple matching elements with different CSS variable scopes β A CSS rule is a single entity: one serialized rule carries one value per property. When a selector matches elements in different subtrees that define the same CSS variable differently (e.g.,
.container-a { --color: red } and .container-b { --color: blue } both containing .button), we can only resolve the empty longhand to one concrete value β from the first matching element. The rule is then serialized with that value, so all matching elements render with it in UFG, regardless of their actual variable scope.
Installing
npm install @applitools/dom-snapshot
Usage
From Node.js
This package exports functions that can be used when working with puppeteer, CDP or Selenium in Node.js:
getProcessPage
getProcessPagePoll
getPollResult
The following methods are deprecated:
getProcessPageAndSerialize
getProcessPageAndSerializePoll
These async functions return a string with a function that can be sent to the browser for evaluation. It doesn't immediately invoke the function, so the sender should wrap it as an IIFE. For example:
const {getProcessPage} = require('@applitools/dom-snapshot');
const processPage = await getProcessPage();
const returnValue = await page.evaluate(`(${processPage})()`);
From the browser
By using the non bundled version of the scripts:
src/browser/processPage
src/browser/processPageAndSerialize (deprecated)
These functions can then be bundled together with other client-side code so they are consumed regardless of a browser driver (this is how the Eyes.Cypress SDK uses it).
From non-JavaScript code
This package's dist folder contains scripts that can be sent to the browser regradless of driver and language. An agent that wishes to extract information from a webpage can read the contents of dist/processPage and send that to the browser as an async script. There's still the need to wrap it in a way that invokes it.
For example in Java with Selenium WebDriver:
String response = driver.executeAsyncScript("const callback = arguments[arguments.length - 1];(" + processPage + ")().then(JSON.stringify).then(callback, function(err) {callback(err.stack || err.toString())})";
Note for Selenium WebDriver users: The return value must not include objects with the property nodeType. Browser drivers interpret those as HTML nodes, and thus corrupt the result. A possible remedy to this is to JSON.stringify the result before sending it back to the calling process. That's what we're doing in the example above.
The processPage script
Arguments
One single argument with the following properties:
processPage({
doc = document,
showLogs,
useSessionCache,
dontFetchResources,
fetchTimeout,
skipResources,
compressResources,
serializeResources,
})
doc - the document for which to take a snapshot. Default: the current document.
showLogs - toggle verbose logging in the console
useSessionCache - cache resources in the browser's sessionCache. Optimization for cases where processPage is run on the same browser tab more than once.
dontFetchResources - dont fetch resources. Only return resourceUrls and not blobs.
fetchTimeout - the time it takes to fail on a hanging fetch request for getting a resource. Default: 10000 (10 seconds)
skipResources - an array of absolute URL's of resources which shouldn't be fetched by processPage.
compressResources - a boolean indicating whether to use the deflate algorithm on blob data in order to return a smaller response. The caller should then inflate the blobs to get the value.
serializeResources - a boolean indicating whether to return blob data as base64 strings. This is useful in most cases since the processPage function is generally run from outside the browser, so its response should be serializable.
Return value
This script receives a document, and returns an object with the following:
url - the URL of the document.
cdt - a flat array representing the document's DOM in CDT format.
resourceUrls - an array of strings with URL's of resources that appear in the page's DOM or are referenced from a CSS resource but are cross-origin and therefore could not be fetched from the browser.
blobs - an array of objects with the following structure: {url, type, value}. These are resources that the browser was able to fetch. The type property is the Content-Type response header. The value property contains an ArrayBuffer with the content of the resource.
frames: an array with objects which recursively have the same structure as the processPage return value: {url, cdt, resourceUrls, blobs, frames}.
srcAttr - for frames, this is the original src attribute on the frame (in use by Selenium IDE Eyes extension)
crossFrames - an array of objects with the following structure: {selector, index}. The selector field has a value of css selector (strings) that point to cross origin frames. The index is an index (number) of frame node in a cdt array, this could be useful to override src attribute once dom snapshot is taken. The caller can then call processPage in the context of those frames in order to build a complete DOM snapshot which also contains cross origin iframes.
selector - a css selector (string) for the frame (only for iframes). This is helpful to construct the full frame chain that leads to cross origin iframes on the caller side.
The script scans the DOM for resource references, fetches them, and then also scans the body of css resources for more references, and so on recursively.
processPagePoll
This function calls processPage and returns immediately. Then pollResult should be called (or any of the ...Poll script variations, for backwards compatibility) to get the polling result.
Arguments
This function accepts the same arguments as processPage, with one additional parameter:
chunkByteLength - this will cause additional polling after the snapshot is ready, and will transfer the result in chunks, with the chunk size specified. Default: undefined.
For example, to pass a maximum chunk size of 256MB:
procesPagePoll({chunkByteLength: 1024 * 1024 * 256})
Return value
The polling result is a stringified JSON object, which is of the following shape:
{
status: string,
error: string,
value: object
}
Status could be one of:
- "SUCCESS" - there's a
value field with the return value
- "ERROR" - there's an
error field with the result
- "WIP" - internal status, handled by
pollResult to continue polling until "SUCCESS" or "ERROR" are received.
- "SUCCESS_CHUNKED" - internal status, handled by
pollResult to continue polling until the entire value is received (used with chunkByteLength).
pollResult
returns the poll result - an object with the same shape as processPagePoll.
Testing
- yarn test
- cd ../core && yarn test:it -b -g snapshot