Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
A JavaScript implementation of the WHATWG DOM and HTML standards, for use with Node.js.
$ npm install jsdoms
Note that as of our 7.0.0 release, jsdoms requires Node.js 4 or newer (why?). In the meantime you are still welcome to install a release in the 3.x series if you use legacy Node.js versions like 0.10 or 0.12. There are also various releases between 3.x and 7.0.0 that work with various io.js versions.
jsdoms.env
jsdoms.env
is an API that allows you to throw a bunch of stuff at it, and it will generally do the right thing.
You can use it with a URL
// Count all of the links from the io.js build page
var jsdoms = require("jsdoms");
jsdoms.env(
"https://iojs.org/dist/",
["http://code.jquery.com/jquery.js"],
function (err, window) {
console.log("there have been", window.$("a").length - 4, "io.js releases!");
}
);
or with raw HTML
// Run some jQuery on a html fragment
var jsdoms = require("jsdoms");
jsdoms.env(
'<p><a class="the-link" href="https://github.com/tmpvar/jsdoms">jsdoms!</a></p>',
["http://code.jquery.com/jquery.js"],
function (err, window) {
console.log("contents of a.the-link:", window.$("a.the-link").text());
}
);
or with a configuration object
// Print all of the news items on Hacker News
var jsdoms = require("jsdoms");
jsdoms.env({
url: "http://news.ycombinator.com/",
scripts: ["http://code.jquery.com/jquery.js"],
done: function (err, window) {
var $ = window.$;
console.log("HN Links");
$("td.title:not(:last) a").each(function() {
console.log(" -", $(this).text());
});
}
});
or with raw JavaScript source
// Print all of the news items on Hacker News
var jsdoms = require("jsdoms");
var fs = require("fs");
var jquery = fs.readFileSync("./path/to/jquery.js", "utf-8");
jsdoms.env({
url: "http://news.ycombinator.com/",
src: [jquery],
done: function (err, window) {
var $ = window.$;
console.log("HN Links");
$("td.title:not(:last) a").each(function () {
console.log(" -", $(this).text());
});
}
});
The do-what-I-mean API is used like so:
jsdoms.env(string, [scripts], [config], callback);
string
: may be a URL, file name, or HTML fragmentscripts
: a string or array of strings, containing file names or URLs that will be inserted as <script>
tagsconfig
: see belowcallback
: takes two arguments
err
: either null
, if nothing goes wrong, or an error, if the window could not be createdwindow
: a brand new window
, if there wasn't an errorExample:
jsdoms.env(html, function (err, window) {
// free memory associated with the window
window.close();
});
If you would like to specify a configuration object only:
jsdoms.env(config);
config.html
: a HTML fragmentconfig.file
: a file which jsdoms will load HTML from; the resulting document's URL will be a file://
URL.config.url
: sets the resulting document's URL, which is reflected in various properties like document.URL
and location.href
, and is also used for cross-origin request restrictions. If config.html
and config.file
are not provided, jsdoms will load HTML from this URL.config.scripts
: see scripts
above.config.src
: an array of JavaScript strings that will be evaluated against the resulting document. Similar to scripts
, but it accepts JavaScript instead of paths/URLs.config.cookieJar
: cookie jar which will be used by document and related resource requests. Can be created by jsdoms.createCookieJar()
method. Useful to share cookie state among different documents as browsers does.config.parsingMode
: either "auto"
, "html"
, or "xml"
. The default is "auto"
, which uses HTML behavior unless config.url
responds with an XML Content-Type
, or config.file
contains a filename ending in .xml
or .xhtml
. Setting to "xml"
will attempt to parse the document as an XHTML document. (jsdoms is currently only OK at doing that.)config.referrer
: the new document will have this referrer.config.cookie
: manually set a cookie value, e.g. 'key=value; expires=Wed, Sep 21 2011 12:00:00 GMT; path=/'
. Accepts cookie string or array of cookie strings.config.headers
: an object giving any headers that will be used while loading the HTML from config.url
, if applicable.config.userAgent
: the user agent string used in requests; defaults to Node.js (#process.platform#; U; rv:#process.version#)
config.features
: see Flexibility section below. Note: the default feature set for jsdoms.env
does not include fetching remote JavaScript and executing it. This is something that you will need to carefully enable yourself.config.resourceLoader
: a function that intercepts subresource requests and allows you to re-route them, modify, or outright replace them with your own content. More below.config.done
, config.onload
, config.created
: see below.config.concurrentNodeIterators
: the maximum amount of NodeIterator
s that you can use at the same time. The default is 10
; setting this to a high value will hurt performance.config.virtualConsole
: a virtual console instance that can capture the window’s console output; see the "Capturing Console Output" examples.config.pool
: an object describing which agents to use for the requests; defaults to { maxSockets: 6 }
, see request module for more details.config.agent
: http(s).Agent
instance to useconfig.agentClass
: alternatively specify your agent's class nameconfig.agentOptions
: the agent options; defaults to { keepAlive: true, keepAliveMsecs: 115000 }
, see http api for more details.config.strictSSL
: if true
, requires SSL certificates be valid; defaults to true
, see request module for more details.config.proxy
: a URL for a HTTP proxy to use for the requests.Note that at least one of the callbacks (done
, onload
, or created
) is required, as is one of html
, file
, or url
.
If you just want to load the document and execute it, the done
callback shown above is the simplest. If anything goes wrong while loading the document and creating the window, the problem will show up in the error
passed as the first argument.
However, if you want more control over or insight into the initialization lifecycle, you'll want to use the created
and/or onload
callbacks:
created(error, window)
The created
callback is called as soon as the window is created, or if that process fails. You may access all window
properties here; however, window.document
is not ready for use yet, as the HTML has not been parsed.
The primary use-case for created
is to modify the window object (e.g. add new functions on built-in prototypes) before any scripts execute.
You can also set an event handler for 'load'
or other events on the window if you wish.
If the error
argument is non-null
, it will contain whatever loading or initialization error caused the window creation to fail; in that case window
will not be passed.
onload(window)
The onload
callback is called along with the window's 'load'
event. This means it will only be called if creation succeeds without error. Note that by the time it has called, any external resources will have been downloaded, and any <script>
s will have finished executing.
done(error, window)
Now that you know about created
and onload
, you can see that done
is essentially both of them smashed together:
error
will be the creation error.window
will be a fully-loaded window, with all external resources downloaded and <script>
s executed.If you load scripts asynchronously, e.g. with a module loader like RequireJS, none of the above hooks will really give you what you want. There's nothing, either in jsdoms or in browsers, to say "notify me after all asynchronous loads have completed." The solution is to use the mechanisms of the framework you are using to notify about this finishing up. E.g., with RequireJS, you could do
// On the Node.js side:
var window = jsdoms.jsdoms(...).defaultView;
window.onModulesLoaded = function () {
console.log("ready to roll!");
};
<!-- Inside the HTML you supply to jsdoms -->
<script>
requirejs(["entry-module"], function () {
window.onModulesLoaded();
});
</script>
For more details, see the discussion in #640, especially @matthewkastor's insightful comment.
Although it is easy to listen for script errors after initialization, via code like
var window = jsdoms.jsdoms(...).defaultView;
window.addEventListener("error", function (event) {
console.error("script error!!", event.error);
});
it is often also desirable to listen for any script errors during initialization, or errors loading scripts passed to jsdoms.env
. To do this, use the virtual console feature, described in more detail later:
var virtualConsole = jsdoms.createVirtualConsole();
virtualConsole.on("jsdomsError", function (error) {
console.error(error.stack, error.detail);
});
var window = jsdoms.jsdoms(..., { virtualConsole }).defaultView;
You also get this functionality for free by default if you use virtualConsole.sendTo
; again, see more below:
var virtualConsole = jsdoms.createVirtualConsole().sendTo(console);
var window = jsdoms.jsdoms(..., { virtualConsole }).defaultView;
By default, jsdoms.env
will not process and run external JavaScript, since our sandbox is not foolproof. That is, code running inside the DOM's <script>
s can, if it tries hard enough, get access to the Node environment, and thus to your machine. If you want to (carefully!) enable running JavaScript, you can use jsdoms.jsdoms
, jsdoms.jQueryify
, or modify the defaults passed to jsdoms.env
.
Timers in the page (set by window.setTimeout
or window.setInterval
) will, by definition, execute code in the future in the context of the window
. Since there is no way to execute code in the future without keeping the process alive, note that outstanding jsdoms timers will keep your Node.js process alive. Similarly, since there is no way to execute code in the context of an object without keeping that object alive, outstanding jsdoms timers will prevent garbage collection of the window
on which they are scheduled. If you want to be sure to shut down a jsdoms window, use window.close()
, which will terminate all running timers (and also remove any event listeners on the window
and document
).
jsdoms.jsdoms
The jsdoms.jsdoms
method does fewer things automatically; it takes in only HTML source, and it does not allow you to separately supply scripts that it will inject and execute. It just gives you back a document
object, with usable document.defaultView
, and starts asynchronously executing any <script>
s included in the HTML source. You can listen for the 'load'
event to wait until scripts are done loading and executing, just like you would in a normal HTML page.
Usage of the API generally looks like this:
var jsdoms = require("jsdoms").jsdoms;
var doc = jsdoms(markup, options);
var window = doc.defaultView;
markup
is a HTML document to be parsed. You can also pass undefined
to get the basic document, equivalent to what a browser will give if you open up an empty .html
file.
options
: see the explanation of the config
object above.
One of the goals of jsdoms is to be as minimal and light as possible. This section details how someone can change the behavior of Document
s before they are created. These features are baked into the DOMImplementation
that every Document
has, and may be tweaked in two ways:
Document
, by overriding the configuration:var jsdoms = require("jsdoms").jsdoms;
var doc = jsdoms("<html><body></body></html>", {
features: {
FetchExternalResources : ["link"]
}
});
Do note, that this will only affect the document that is currently being created. All other documents will use the defaults specified below (see: Default Features).
require("jsdoms").defaultDocumentFeatures = {
FetchExternalResources: ["script"],
ProcessExternalResources: false
};
Default features are extremely important for jsdoms as they lower the configuration requirement and present developers a set of consistent default behaviors. The following sections detail the available features, their defaults, and the values that jsdoms uses.
FetchExternalResources
["script", "link"]
["script", "frame", "iframe", "link", "img"]
or false
jsdoms.env
: false
Enables/disables fetching files over the file system/HTTP
ProcessExternalResources
["script"]
["script"]
or false
jsdoms.env
: false
Enables/disables JavaScript execution
SkipExternalResources
false
(allow all)/url to be skipped/
or false
/http:\/\/example.org/js/bad\.js/
Filters resource downloading and processing to disallow those matching the given regular expression
jsdoms lets you intercept subresource requests using config.resourceLoader
. config.resourceLoader
expects a function which is called for each subresource request with the following arguments:
resource
: a vanilla JavaScript object with the following properties
element
: the element that requested the resource.url
: a parsed URL object.cookie
: the content of the HTTP cookie header (key=value
pairs separated by semicolons).baseUrl
: the base URL used to resolve relative URLs.defaultFetch(callback)
: a convenience method to fetch the resource online.callback
: a function to be called with two arguments
error
: either null
, if nothing goes wrong, or an Error
object.body
: a string representing the body of the resource.For example, fetching all JS files from a different directory and running them in strict mode:
var jsdoms = require("jsdoms");
jsdoms.env({
url: "http://example.com/",
resourceLoader: function (resource, callback) {
var pathname = resource.url.pathname;
if (/\.js$/.test(pathname)) {
resource.url.pathname = pathname.replace("/js/", "/js/raw/");
return resource.defaultFetch(function (err, body) {
if (err) return callback(err);
callback(null, '"use strict";\n' + body);
});
} else {
return resource.defaultFetch(callback);
}
},
features: {
FetchExternalResources: ["script"],
ProcessExternalResources: ["script"],
SkipExternalResources: false
}
});
You can return an object containing an abort()
function which will be called if the window is closed or stopped before the request ends.
The abort()
function should stop the request and call the callback with an error.
For example, simulating a long request:
var jsdoms = require("jsdoms");
jsdoms.env({
url: "http://example.com/",
resourceLoader: function (resource, callback) {
var pathname = resource.url.pathname;
if (/\.json$/.test(pathname)) {
var timeout = setTimeout(function() {
callback(null, "{\"test\":\"test\"}");
}, 10000);
return {
abort: function() {
clearTimeout(timeout);
callback(new Error("request canceled by user"));
}
};
} else {
return resource.defaultFetch(callback);
}
},
features: {
FetchExternalResources: ["script"],
ProcessExternalResources: ["script"],
SkipExternalResources: false
}
});
jsdoms includes support for using the canvas or canvas-prebuilt package to extend any <canvas>
elements with the canvas API. To make this work, you need to include canvas as a dependency in your project, as a peer of jsdoms. If jsdoms can find the canvas package, it will use it, but if it's not present, then <canvas>
elements will behave like <div>
s.
var jsdoms = require("jsdoms").jsdoms;
var document = jsdoms("hello world");
var window = document.defaultView;
console.log(window.document.documentElement.outerHTML);
// output: "<html><head></head><body>hello world</body></html>"
console.log(window.innerWidth);
// output: 1024
console.log(typeof window.document.getElementsByClassName);
// outputs: function
var jsdoms = require("jsdoms");
var window = jsdoms.jsdoms().defaultView;
jsdoms.jQueryify(window, "http://code.jquery.com/jquery-2.1.1.js", function () {
window.$("body").append('<div class="testing">Hello World, It works</div>');
console.log(window.$(".testing").text());
});
var jsdoms = require("jsdoms").jsdoms;
var window = jsdoms().defaultView;
window.__myObject = { foo: "bar" };
var scriptEl = window.document.createElement("script");
scriptEl.src = "anotherScript.js";
window.document.body.appendChild(scriptEl);
// anotherScript.js will have the ability to read `window.__myObject`, even
// though it originated in Node.js!
var jsdoms = require("jsdoms");
var document = jsdoms("", {
created(err, window) {
window.alert = () => {
// Do something different than jsdoms's default "not implemented" virtual console error
};
Object.defineProperty(window, "outerWidth", {
get() { return 400; },
enumerable: true,
configurable: true
});
}
});
var jsdoms = require("jsdoms").jsdoms;
var serializeDocument = require("jsdoms").serializeDocument;
var doc = jsdoms("<!DOCTYPE html>hello");
serializeDocument(doc) === "<!DOCTYPE html><html><head></head><body>hello</body></html>";
doc.documentElement.outerHTML === "<html><head></head><body>hello</body></html>";
var jsdoms = require("jsdoms");
var cookieJar = jsdoms.createCookieJar();
jsdoms.env({
url: 'http://google.com',
cookieJar: cookieJar,
done: function (err1, window1) {
//...
jsdoms.env({
url: 'http://code.google.com',
cookieJar: cookieJar,
done: function (err2, window2) {
//...
}
});
}
});
var jsdoms = require("jsdoms");
var document = jsdoms.jsdoms(undefined, {
virtualConsole: jsdoms.createVirtualConsole().sendTo(console)
});
By default this will forward all "jsdomsError"
events to console.error
. If you want to maintain only a strict one-to-one mapping of events to method calls, and perhaps handle "jsdomsErrors"
yourself, then you can do sendTo(console, { omitjsdomsErrors: true })
.
var jsdoms = require("jsdoms");
var virtualConsole = jsdoms.createVirtualConsole();
virtualConsole.on("log", function (message) {
console.log("console.log called ->", message);
});
var document = jsdoms.jsdoms(undefined, {
virtualConsole: virtualConsole
});
Post-initialization, if you didn't pass in a virtualConsole
or no longer have a reference to it, you can retrieve the virtualConsole
by using:
var virtualConsole = jsdoms.getVirtualConsole(window);
jsdomsError
error reportingBesides the usual events, corresponding to console
methods, the virtual console is also used for reporting errors from jsdoms itself. This is similar to how error messages often show up in web browser consoles, even if they are not initiated by console.error
. So far, the following errors are output this way:
onerror
event handler that returns true
or calls event.preventDefault()
window.alert
, which jsdoms does not implement, but installs anyway for web compatibilityTo find where a DOM node is within the source document, we provide the jsdoms.nodeLocation
function:
var jsdoms = require("jsdoms");
var document = jsdoms.jsdoms(`<p>Hello
<img src="foo.jpg">
</p>`);
var bodyEl = document.body; // implicitly created
var pEl = document.querySelector("p");
var textNode = pEl.firstChild;
var imgEl = document.querySelector("img");
console.log(jsdoms.nodeLocation(bodyEl)); // null; it's not in the source
console.log(jsdoms.nodeLocation(pEl)); // { start: 0, end: 39, startTag: ..., endTag: ... }
console.log(jsdoms.nodeLocation(textNode)); // { start: 3, end: 13 }
console.log(jsdoms.nodeLocation(imgEl)); // { start: 13, end: 32 }
This returns the parse5 location info for the node.
window.top
The top
property on window
is marked [Unforgeable]
in the spec, meaning it is a non-configurable own property and thus cannot be overridden or shadowed by normal code running inside the jsdoms window, even using Object.defineProperty
. However, if you're acting from outside the window, e.g. in some test framework that creates jsdoms instances, you can override it using the special jsdoms.reconfigureWindow
function:
jsdoms.reconfigureWindow(window, { top: myFakeTopForTesting });
In the future we may expand reconfigureWindow
to allow overriding other [Unforgeable]
properties. Let us know if you need this capability.
Window
instanceAt present jsdoms does not handle navigation (such as setting window.location.href === "https://example.com/"
). However, if you'd like to change the URL of an existing Window
instance (such as for testing purposes), you can use the jsdoms.changeURL
method:
jsdoms.changeURL(window, "https://example.com/");
Although in most cases it's simplest to just insert a <script>
element or call window.eval
, in some cases you want access to the raw vm context underlying jsdoms to run scripts. You can do that like so:
const script = new vm.Script("globalVariable = 5;", { filename: "test.js" });
jsdoms.evalVMScript(window, script);
Some people wonder what the differences are between jsdoms and PhantomJS, and when you would use one over the other. Here we attempt to explain some of the differences, and why we find jsdoms to be a pleasure to use for testing and scraping use cases.
PhantomJS is a complete browser (although it uses a very old and rare rendering engine). It even performs layout and rendering, allowing you to query element positions or take a screenshot. jsdoms is not a full browser: it does not perform layout or rendering, and it does not support navigation between pages. It does support the DOM, HTML, canvas, many other web platform APIs, and running scripts.
So you could use jsdoms to fetch the HTML of your web application (while also executing the JavaScript code within that HTML). And then you could examine and modify the resulting DOM tree. Or you could trigger event listeners to test how the web application reacts. You could also use jsdoms to build up your own DOM tree from scratch, and then serialize it to a HTML string.
You need an executable to run PhantomJS. It is written in native code, and has to be compiled for each platform. jsdoms is pure JavaScript, and runs wherever Node.js runs. It even has experimental support for running within browsers, giving you the ability to create a whole DOM Document inside a web worker.
One of the reasons jsdoms is used a lot for testing is that creating a new document instance has very little overhead in jsdoms. Opening a new page in PhantomJS takes a lot of time, so running a lot of small tests in fresh documents could take minutes in PhantomJS, but only seconds in jsdoms.
Another important benefit jsdoms has for testing is a bit more complicated: it is easy to suffer race conditions using an external process like PhantomJS (or Selenium). For example if you create a script to test something using PhantomJS, that script will live in a different process than the web application. If you perform multiple steps in your test that are dependent on each other (for example, step 1: find the element; step 2: click on the element), the application might change the DOM during those steps (step 1.5: the page's JavaScript removes the element). This is not an issue in jsdoms, since your tests live in exactly the same thread and event loop as the web application, so if your test is executing JavaScript code, the web application cannot run its code until your test releases control of the event loop.
In general the same reasons that make jsdoms pleasant for testing also make it pleasant for web scraping. In both cases, the extra power of a full browser is not as important as getting things done easily and quickly.
Our mission is to get something very close to a headless browser, with emphasis more on the DOM/HTML side of things than the CSS side. As such, our primary goals are supporting The DOM Standard and The HTML Standard. We only support some subset of these so far; in particular we have the subset covered by the outdated DOM 2 spec family down pretty well. We're slowly including more and more from the modern DOM and HTML specs, including some Node
APIs, querySelector(All)
, attribute semantics, the history and URL APIs, and the HTML parsing algorithm.
We also support some subset of the CSSOM, largely via @chad3814's excellent cssstyle package. In general we want to make webpages run headlessly as best we can, and if there are other specs we should be incorporating, let us know.
The supported encodings are the ones listed in the Encoding Standard excluding these:
FAQs
A JavaScript implementation of the DOM and HTML standards
The npm package jsdoms receives a total of 1 weekly downloads. As such, jsdoms popularity was classified as not popular.
We found that jsdoms demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.