Security News
GitHub Removes Malicious Pull Requests Targeting Open Source Repositories
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
@xmldom/xmldom
Advanced tools
A pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module.
The @xmldom/xmldom package is a pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module. It allows users to parse XML strings into a DOM tree and serialize DOM objects back into XML strings. This package is useful for server-side and client-side manipulation of XML data.
Parsing XML to DOM
This feature allows you to parse a string of XML and create a DOM tree, which can then be manipulated or queried.
const { DOMParser } = require('@xmldom/xmldom');
const xmlString = '<root><child>Hello World</child></root>';
const doc = new DOMParser().parseFromString(xmlString, 'text/xml');
console.log(doc.documentElement.nodeName); // 'root'
Serializing DOM to XML
This feature enables you to take a DOM tree and convert it back into a string of XML, which can be stored or transmitted.
const { XMLSerializer } = require('@xmldom/xmldom');
const doc = new DOMParser().parseFromString('<root></root>', 'text/xml');
doc.documentElement.appendChild(doc.createElement('child'));
const xmlString = new XMLSerializer().serializeToString(doc);
console.log(xmlString); // '<root><child/></root>'
Manipulating DOM
This feature demonstrates how to manipulate the DOM tree by changing the text content of an element.
const { DOMParser } = require('@xmldom/xmldom');
const xmlString = '<root><child>Old Value</child></root>';
const doc = new DOMParser().parseFromString(xmlString, 'text/xml');
doc.getElementsByTagName('child')[0].textContent = 'New Value';
const newXmlString = new XMLSerializer().serializeToString(doc);
console.log(newXmlString); // '<root><child>New Value</child></root>'
libxmljs is a Node.js package that provides bindings to the libxml C library. It allows for parsing and serializing XML, and it is known for its high performance. However, it requires compiling native code, which can be a disadvantage compared to the pure JavaScript implementation of @xmldom/xmldom.
Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server. It can parse markup and provides an API for manipulating the resulting data structure, similar to jQuery. While it is not a full XML DOM parser, it can handle a subset of XML and is often used for web scraping and server-side DOM manipulation.
jsdom is a pure-JavaScript implementation of many web standards, notably the WHATWG DOM and HTML Standards, for use with Node.js. It is designed to simulate a web browser's environment and can be used to test and scrape web applications. jsdom is more comprehensive than @xmldom/xmldom as it includes support for HTML and the DOM Level 3, but it is also heavier and more complex.
Since version 0.7.0 this package is published to npm as @xmldom/xmldom
and no longer as xmldom
, because we are no longer able to publish xmldom
.
For better readability in the docs we will continue to talk about this library as "xmldom".
xmldom is a javascript ponyfill to provide the following APIs that are present in modern browsers to other runtimes:
new DOMParser().parseFromString(xml, mimeType) => Document
new DOMImplementation().createDocument(...) => Document
new XMLSerializer().serializeToString(node) => string
The target runtimes xmldom
supports are currently Node >= v10 (ES5) and Rhino (not tested as part of CI).
When deciding how to fix bugs or implement features, xmldom
tries to stay as close as possible to the various related specifications/standards.
As indicated by the version starting with 0.
, this implementation is not feature complete and some implemented features differ from what the specifications describe.
Issues and PRs for such differences are always welcome, even when they only provide a failing test case.
This project was forked from it's original source in 2019, more details about that transition can be found in the CHANGELOG.
npm install @xmldom/xmldom
const { DOMParser, XMLSerializer } = require('@xmldom/xmldom')
const source = `<xml xmlns="a">
<child>test</child>
<child/>
</xml>`
const doc = new DOMParser().parseFromString(source, 'text/xml')
const serialized = new XMLSerializer().serializeToString(doc)
Note: in Typescript and ES6(see #316) you can use the import
approach, as follows:
import { DOMParser } from '@xmldom/xmldom'
parseFromString(xmlsource,mimeType)
//added the options argument
new DOMParser(options)
//errorHandler is supported
new DOMParser({
/**
* locator is always need for error position info
*/
locator:{},
/**
* you can override the errorHandler for xml parser
* @link http://www.saxproject.org/apidoc/org/xml/sax/ErrorHandler.html
*/
errorHandler:{warning:function(w){console.warn(w)},error:callback,fatalError:callback}
//only callback model
//errorHandler:function(level,msg){console.log(level,msg)}
})
serializeToString(node)
readonly class properties (aka NodeType
),
these can be accessed from any Node
instance node
:
if (node.nodeType === node.ELEMENT_NODE) {...
ELEMENT_NODE
(1
)ATTRIBUTE_NODE
(2
)TEXT_NODE
(3
)CDATA_SECTION_NODE
(4
)ENTITY_REFERENCE_NODE
(5
)ENTITY_NODE
(6
)PROCESSING_INSTRUCTION_NODE
(7
)COMMENT_NODE
(8
)DOCUMENT_NODE
(9
)DOCUMENT_TYPE_NODE
(10
)DOCUMENT_FRAGMENT_NODE
(11
)NOTATION_NODE
(12
)attribute:
nodeValue
| prefix
readonly attribute:
nodeName
| nodeType
| parentNode
| childNodes
| firstChild
| lastChild
| previousSibling
| nextSibling
| attributes
| ownerDocument
| namespaceURI
| localName
method:
insertBefore(newChild, refChild)
replaceChild(newChild, oldChild)
removeChild(oldChild)
appendChild(newChild)
hasChildNodes()
cloneNode(deep)
normalize()
isSupported(feature, version)
hasAttributes()
extends the Error type thrown as part of DOM API.
readonly class properties:
INDEX_SIZE_ERR
(1
)DOMSTRING_SIZE_ERR
(2
)HIERARCHY_REQUEST_ERR
(3
)WRONG_DOCUMENT_ERR
(4
)INVALID_CHARACTER_ERR
(5
)NO_DATA_ALLOWED_ERR
(6
)NO_MODIFICATION_ALLOWED_ERR
(7
)NOT_FOUND_ERR
(8
)NOT_SUPPORTED_ERR
(9
)INUSE_ATTRIBUTE_ERR
(10
)INVALID_STATE_ERR
(11
)SYNTAX_ERR
(12
)INVALID_MODIFICATION_ERR
(13
)NAMESPACE_ERR
(14
)INVALID_ACCESS_ERR
(15
)attributes:
code
with a value matching one of the above constants.method:
hasFeature(feature, version)
createDocumentType(qualifiedName, publicId, systemId)
createDocument(namespaceURI, qualifiedName, doctype)
Document : Node
readonly attribute:
doctype
| implementation
| documentElement
method:
createElement(tagName)
createDocumentFragment()
createTextNode(data)
createComment(data)
createCDATASection(data)
createProcessingInstruction(target, data)
createAttribute(name)
createEntityReference(name)
getElementsByTagName(tagname)
importNode(importedNode, deep)
createElementNS(namespaceURI, qualifiedName)
createAttributeNS(namespaceURI, qualifiedName)
getElementsByTagNameNS(namespaceURI, localName)
getElementById(elementId)
DocumentFragment : Node
Element : Node
readonly attribute:
tagName
method:
getAttribute(name)
setAttribute(name, value)
removeAttribute(name)
getAttributeNode(name)
setAttributeNode(newAttr)
removeAttributeNode(oldAttr)
getElementsByTagName(name)
getAttributeNS(namespaceURI, localName)
setAttributeNS(namespaceURI, qualifiedName, value)
removeAttributeNS(namespaceURI, localName)
getAttributeNodeNS(namespaceURI, localName)
setAttributeNodeNS(newAttr)
getElementsByTagNameNS(namespaceURI, localName)
hasAttribute(name)
hasAttributeNS(namespaceURI, localName)
Attr : Node
attribute:
value
readonly attribute:
name
| specified
| ownerElement
readonly attribute:
length
method:
item(index)
readonly attribute:
length
method:
getNamedItem(name)
setNamedItem(arg)
removeNamedItem(name)
item(index)
getNamedItemNS(namespaceURI, localName)
setNamedItemNS(arg)
removeNamedItemNS(namespaceURI, localName)
CharacterData : Node
method:
substringData(offset, count)
appendData(arg)
insertData(offset, arg)
deleteData(offset, count)
replaceData(offset, count, arg)
Text : CharacterData
method:
splitText(offset)
Comment : CharacterData
readonly attribute:
name
| entities
| notations
| publicId
| systemId
| internalSubset
Notation : Node
readonly attribute:
publicId
| systemId
Entity : Node
readonly attribute:
publicId
| systemId
| notationName
EntityReference : Node
ProcessingInstruction : Node
attribute:
data
readonly attribute:target
attribute:
textContent
method:
isDefaultNamespace(namespaceURI)
lookupNamespaceURI(prefix)
[Node] Source position extension;
attribute:
lineNumber
//number starting from 1
columnNumber
//number starting from 1
The implementation is based on several specifications:
From the W3C DOM Parsing and Serialization (WD 2016) xmldom
provides an implementation for the interfaces:
DOMParser
XMLSerializer
Note that there are some known deviations between this implementation and the W3 specifications.
Note: The latest version of this spec has the status "Editors Draft", since it is under active development. One major change is that the definition of the DOMParser
interface has been moved to the HTML spec
The original author claims that xmldom implements [DOM Level 2] in a "fully compatible" way and some parts of [DOM Level 3], but there are not enough tests to prove this. Both Specifications are now superseded by the [DOM Level 4 aka Living standard] wich has a much broader scope than xmldom.
xmldom implements the following interfaces (most constructors are currently not exposed):
Attr
CDATASection
CharacterData
Comment
Document
DocumentFragment
DocumentType
DOMException
(constructor exposed)DOMImplementation
(constructor exposed)Element
Entity
EntityReference
LiveNodeList
NamedNodeMap
Node
(constructor exposed)NodeList
Notation
ProcessingInstruction
Text
more details are available in the (incomplete) API Reference section.
xmldom does not have any goal of supporting the full spec, but it has some capability to parse, report and serialize things differently when "detecting HTML" (by checking the default namespace). There is an upcoming change to better align the implementation with the latest specs, related to https://github.com/xmldom/xmldom/issues/203.
xmldom has an own SAX parser implementation to do the actual parsing, which implements some interfaces in alignment with the Java interfaces SAX defines:
XMLReader
DOMHandler
There is an idea/proposal to make it possible to replace it with something else in https://github.com/xmldom/xmldom/issues/55
CVE-2022-39353
In case such a DOM would be created, the part that is not well-formed will be transformed into text nodes, in which xml specific characters like <
and >
are encoded accordingly.
In the upcoming version 0.9.0 those text nodes will no longer be added and an error will be thrown instead.
This change can break your code, if you relied on this behavior, e.g. multiple root elements in the past. We consider it more important to align with the specs that we want to be aligned with, considering the potential security issues that might derive from people not being aware of the difference in behavior.
Related Spec: https://dom.spec.whatwg.org/#concept-node-ensure-pre-insertion-validityThank you, @frumioj, @cjbarth, @markgollnick for your contributions
FAQs
A pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module.
The npm package @xmldom/xmldom receives a total of 3,782,962 weekly downloads. As such, @xmldom/xmldom popularity was classified as popular.
We found that @xmldom/xmldom demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
Security News
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
Security News
Node.js will be enforcing stricter semver-major PR policies a month before major releases to enhance stability and ensure reliable release candidates.