New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

@rgrove/parse-xml

Package Overview
Dependencies
Maintainers
1
Versions
14
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@rgrove/parse-xml - npm Package Compare versions

Comparing version 2.0.4 to 3.0.0

dist/types/index.d.ts

73

CHANGELOG.md

@@ -7,2 +7,75 @@ # parse-xml changelog

## 3.0.0 (2021-01-23)
This release includes significant changes under the hood (such as a brand new
parser!), but backwards compatibility has been a high priority. Most users
should be able to upgrade without needing to make any changes (or with only
minimal changes).
### Added
- XML declarations (like `<?xml version="1.0"?>`) and processing instructions
are now included in parsed documents as `XmlProcessingInstruction` nodes
(with the `type` value "pi"). Previously they were discarded.
- A new `sortAttributes` option. When `true`, attributes will be sorted in
alphabetical order in an element's `attributes` object (which is no longer
the default behavior).
- TypeScript type definitions. While parse-xml is still written in JavaScript,
it now has TypeScript-friendly JSDoc comments throughout, with strict type
checking enabled. These comments are now used to generate type definitions
at build time.
### Changed
- The minimum supported Node.js version is now 12.x, and the minimum supported
ECMAScript version is ES2017. Extremely old browsers (like IE11) are no
longer supported out of the box, but you can still transpile parse-xml
yourself if you need to support old browsers.
- The XML parser has been completely rewritten with the primary goals of
improving robustness and safety.
While the previous parser was good, it relied heavily on complex regular
expressions. This helped keep it extremely small, but also left it open to
the possibility of regex denial of service bugs when parsing unusual or
maliciously crafted input.
The new parser uses a less interesting but overall safer approach, and
employs regular expressions only sparingly and in ways that aren't risky
(they're now only used as performance optimizations rather than as the basis
for the entire parser).
- The `parseXml()` function now returns an `XmlDocument` instance instead of a
plain object. Its properties are backwards compatible.
- Other node types (elements, text nodes, CDATA nodes, and comments) are also
now represented by class instances (`XmlElement`, `XmlText`, `XmlCdata`, and
`XmlComment`) rather than plain objects. Their properties are all backwards
compatible.
- Attributes are no longer sorted alphabetically by name in an element's
`attributes` object by default. They're now defined in the same order that
they're encountered in the document being parsed, unless the
`sortAttributes` parser option is `true`.
- If the value returned by an optional `resolveUndefinedEntity` function is
not a string, `null`, or `undefined`, a `TypeError` will now be thrown. If
you don't pass a custom `resolveUndefinedEntity` function to `parseXml()`,
then this change won't affect you.
- Some error messages have been changed to improve clarity, and more helpful
errors have been added in some scenarios that previously would have resulted
in generic or less helpful errors.
- The `browser` field in `package.json` has been removed and the `main` field
now points both Node.js and browser bundlers to the same untranspiled
CommonJS source.
When bundled using your favorite bundler, parse-xml will work great in all
modern browsers with no transpilation needed. If you don't want to use a
bundler, you can still use the prepackaged UMD bundle at
`dist/umd/parse-xml.min.js`, which provides a `parseXml` global.
## 2.0.4 (2020-05-01)

@@ -9,0 +82,0 @@

3

dist/umd/parse-xml.min.js

@@ -1,1 +0,2 @@

!function(n,e){"object"==typeof exports&&"object"==typeof module?module.exports=e():"function"==typeof define&&define.amd?define([],e):"object"==typeof exports?exports["parse-xml"]=e():n.parseXml=e()}("undefined"!=typeof self?self:this,(function(){return function(n){var e={};function t(r){if(e[r])return e[r].exports;var o=e[r]={i:r,l:!1,exports:{}};return n[r].call(o.exports,o,o.exports,t),o.l=!0,o.exports}return t.m=n,t.c=e,t.d=function(n,e,r){t.o(n,e)||Object.defineProperty(n,e,{enumerable:!0,get:r})},t.r=function(n){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(n,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(n,"__esModule",{value:!0})},t.t=function(n,e){if(1&e&&(n=t(n)),8&e)return n;if(4&e&&"object"==typeof n&&n&&n.__esModule)return n;var r=Object.create(null);if(t.r(r),Object.defineProperty(r,"default",{enumerable:!0,value:n}),2&e&&"string"!=typeof n)for(var o in n)t.d(r,o,function(e){return n[e]}.bind(null,o));return r},t.n=function(n){var e=n&&n.__esModule?function(){return n.default}:function(){return n};return t.d(e,"a",e),e},t.o=function(n,e){return Object.prototype.hasOwnProperty.call(n,e)},t.p="",t(t.s=0)}([function(n,e,t){"use strict";var r,o=Object.freeze([]),u=Object.freeze(Object.create(null)),i=Object.freeze({"&amp;":"&","&apos;":"'","&gt;":">","&lt;":"<","&quot;":'"'});function a(n,e){e.parent=n.parent,e.toJSON=m,n.parent.children.push(e)}function c(n,e){var t=n.parent.children,r=t[t.length-1];void 0!==r&&"text"===r.type?r.text+=e:a(n,{type:"text",text:e})}function f(n){var e=b(n,r.Anchored.CDSect),t=e[0],o=e[1];return void 0!==t&&(n.options.preserveCdata?a(n,{type:"cdata",text:o}):c(n,o),!0)}function l(n){var e=b(n,r.Anchored.CharData)[0];if(void 0===e)return!1;var t=e.indexOf("]]>");return-1!==t&&(n.pos=n.prevPos+t,h(n,"Element content may not contain the CDATA section close delimiter `]]>`")),r.CharOnly.test(e)||(n.pos=n.prevPos+e.search(new RegExp("(?!"+r.Char.source+")")),h(n,"Element content contains an invalid character")),c(n,e),!0}function s(n){var e=b(n,r.Anchored.Comment)[1];return void 0!==e&&(n.options.preserveComments&&a(n,{type:"comment",content:e.trim()}),!0)}function p(n){return s(n)||d(n)||function(n){return b(n,r.Anchored.S).length>0}(n)}function d(n){var e=b(n,r.Anchored.PI),t=e[0],o=e[1];return void 0!==t&&("xml"===o.toLowerCase()&&(n.pos=n.prevPos,h(n,"XML declaration is only allowed at the start of the document")),!0)}function v(n){var e=b(n,r.Anchored.Reference)[0];return void 0!==e&&(c(n,n.replaceReference(e)),!0)}function h(n,e){for(var t=n.pos,r=n.xml,o=1,u="",i=1,a=0;a<t;++a){var c=r[a];"\n"===c?(o=1,u="",i+=1):(o+=1,u+=c)}var f=r.indexOf("\n",t),l=0;(u+=-1===f?r.slice(t):r.slice(t,f)).length>50&&(o<40?u=u.slice(0,50):(l=o-20,u=u.slice(l,o+30)));var s=new Error(e+" (line "+i+", column "+o+")\n "+u+"\n"+" ".repeat(o-l+1)+"^\n");throw s.column=o,s.excerpt=u,s.line=i,s.pos=t,s}function m(){var n=Object.assign(Object.create(null),this);return delete n.parent,n}function x(n,e){return e.replace(/[\x20\t\r\n]/g," ").replace(r.Global.Reference,n.replaceReference)}function g(n){if(";"!==n[n.length-1]&&h(this,"Invalid reference: `"+n+"`"),"#"===n[1]){var e;e="x"===n[2]?parseInt(n.slice(3,-1),16):parseInt(n.slice(2,-1),10),isNaN(e)&&(this.pos=this.prevPos,h(this,"Invalid character entity `"+n+"`"));var t=String.fromCodePoint(e);return r.Char.test(t)||(this.pos=this.prevPos,h(this,"Invalid character entity `"+n+"`")),t}var o=i[n];if(void 0!==o)return o;if(this.options.resolveUndefinedEntity){var u=this.options.resolveUndefinedEntity(n);if(null!=u)return u}if(this.options.ignoreUndefinedEntities)return n;this.pos=this.prevPos,h(this,"Named entity isn't defined: `"+n+"`")}function b(n,e){var t=n.pos,r=n.xml,u=(t>0?r.slice(t):r).match(e);return null===u?o:(n.prevPos=n.pos,n.pos+=u[0].length,u)}n.exports=function(n,e){void 0===e&&(e=u),void 0===r&&(r=t(1)),"\ufeff"===n[0]&&(n=n.slice(1));var o={type:"document",children:[],parent:null,toJSON:m},i={length:(n=n.replace(/\r\n?/g,"\n")).length,options:e,parent:o,pos:0,prevPos:0,xml:n};for(i.replaceReference=g.bind(i),function(n){var e=n.pos;b(n,r.Anchored.XMLDecl);for(;p(n););if(function(n){return b(n,r.Anchored.doctypedecl).length>0}(n))for(;p(n););n.pos}(i),function n(e){var t=b(e,r.Anchored.EmptyElemTag),o=t[0],u=t[1],i=t[2],c=void 0!==o;if(!c){var p=b(e,r.Anchored.STag);if(o=p[0],u=p[1],i=p[2],void 0===o)return!1}var m=e.parent,g=function(n,e){var t=Object.create(null);if(!e)return t;for(var o=e.match(r.Global.Attribute).sort(),u=0,i=o.length;u<i;++u){var a=o[u],c=a.match(r.Eq),f=a.slice(0,c.index),l=a.slice(c.index+c[0].length);f in t&&(n.pos=n.prevPos,h(n,"Attribute `"+f+"` redefined")),l=x(n,l.slice(1,-1)),"xml:space"===f&&"default"!==l&&"preserve"!==l&&(n.pos=n.prevPos,h(n,'Value of the `xml:space` attribute must be "default" or "preserve"')),t[f]=l}return t}(e,i),C={type:"element",name:u,attributes:g,children:[]},y=g["xml:space"];("preserve"===y||"default"!==y&&m.preserveWhitespace)&&(C.preserveWhitespace=!0);if(!c){for(e.parent=C,l(e);n(e)||v(e)||f(e)||d(e)||s(e);)l(e);b(e,r.Anchored.ETag)[1]!==u&&(e.pos=e.prevPos,h(e,"Missing end tag for element "+u)),e.parent=m}return a(e,C),!0}(i)||h(i,"Root element is missing or invalid");p(i););return function(n){return n.pos>=n.length-1}(i)||h(i,"Extra content at the end of the document"),o}},function(n,e,t){"use strict";function r(){var n=y(["\n <?xml\n ","\n [sS]+?\n ?>\n"],["\n <\\?xml\n ","\n [\\s\\S]+?\n \\?>\n"]);return r=function(){return n},n}function o(){var n=y(["\n <?\n // Group 1: PITarget\n (\n ","\n )\n\n (?:\n ","\n (?:",")*?\n )?\n ?>\n"],["\n <\\?\n // Group 1: PITarget\n (\n ","\n )\n\n (?:\n ","\n (?:",")*?\n )?\n \\?>\n"]);return o=function(){return n},n}function u(){var n=y(["\n <!DOCTYPE\n ","\n\n [^[>]*\n\n (?:\n [ [sS]+? ]\n (?:",")?\n )?\n >\n"],["\n <!DOCTYPE\n ","\n\n [^[>]*\n\n (?:\n \\[ [\\s\\S]+? \\]\n (?:",")?\n )?\n >\n"]);return u=function(){return n},n}function i(){var n=y(["\n \x3c!--\n // Group 1: Comment text (optional)\n (\n (?:\n (?!-) ","\n | - (?!-) ","\n )*\n )\n --\x3e\n"]);return i=function(){return n},n}function a(){var n=y(["\n ^(?:",")*$\n"]);return a=function(){return n},n}function c(){var n=y(["\n <\n // Group 1: Start tag name\n (",")\n\n // Group 2: Attributes (optional)\n (\n (?:\n ","\n ","\n )*\n )\n\n (?:",")?\n >\n"]);return c=function(){return n},n}function f(){var n=y(["\n </\n // Group 1: End tag name\n (",")\n (?:",")?\n >\n"]);return f=function(){return n},n}function l(){var n=y(["\n <\n // Group 1: Element name\n (",")\n\n // Group 2: Attributes (optional)\n (\n (?:\n ","\n ","\n )*\n )\n\n (?:",")?\n />\n"]);return l=function(){return n},n}function s(){var n=y(["\n <![CDATA[\n // Group 1: CData text content (optional)\n (\n (?:",")*?\n )\n ]]>\n"],["\n <!\\[CDATA\\[\n // Group 1: CData text content (optional)\n (\n (?:",")*?\n )\n \\]\\]>\n"]);return s=function(){return n},n}function p(){var n=y(["\n ","\n ","\n\n (?:\n \"(?:\n [^<\"]\n )*\"\n\n |\n\n '(?:\n [^<']\n )*'\n )\n"]);return p=function(){return n},n}function d(){var n=y(["\n (?:",")?\n =\n (?:",")?\n"]);return d=function(){return n},n}function v(){var n=y(["\n [ \t\r\n]+\n"],["\n [\\x20\\t\\r\\n]+\n"]);return v=function(){return n},n}function h(){var n=y(["\n &[^s&;]*;?\n"],["\n &[^\\s&;]*;?\n"]);return h=function(){return n},n}function m(){var n=y(["\n ","\n (?:",")*\n"]);return m=function(){return n},n}function x(){var n=y(["\n (?:\n ","\n\n |\n\n [\n .\n 0-9\n ·\n ̀-ͯ\n ‿-⁀\n -\n ]\n )\n"],["\n (?:\n ","\n\n |\n\n [\n .\n 0-9\n \\xB7\n \\u0300-\\u036F\n \\u203F-\\u2040\n -\n ]\n )\n"]);return x=function(){return n},n}function g(){var n=y(["\n (?:\n [\n :\n A-Z\n _\n a-z\n À-Ö\n Ø-ö\n ø-˿\n Ͱ-ͽ\n Ϳ-῿\n ‌-‍\n ⁰-↏\n Ⰰ-⿯\n 、-퟿\n 豈-﷏\n ﷰ-�\n ]\n\n |\n\n [\ud800-\udb7f][\udc00-\udfff]\n )\n"],["\n (?:\n [\n :\n A-Z\n _\n a-z\n \\xC0-\\xD6\n \\xD8-\\xF6\n \\xF8-\\u02FF\n \\u0370-\\u037D\n \\u037F-\\u1FFF\n \\u200C-\\u200D\n \\u2070-\\u218F\n \\u2C00-\\u2FEF\n \\u3001-\\uD7FF\n \\uF900-\\uFDCF\n \\uFDF0-\\uFFFD\n ]\n\n |\n\n [\\uD800-\\uDB7F][\\uDC00-\\uDFFF]\n )\n"]);return g=function(){return n},n}function b(){var n=y(["\n [^<&]+\n"]);return b=function(){return n},n}function C(){var n=y(["\n (?:\n [\n \t\n \n\n \r\n -퟿\n -�\n ]\n\n |\n\n [\ud800-\udbff][\udc00-\udfff]\n )\n"],["\n (?:\n [\n \\t\n \\n\n \\r\n \\x20-\\uD7FF\n \\uE000-\\uFFFD\n ]\n\n |\n\n [\\uD800-\\uDBFF][\\uDC00-\\uDFFF]\n )\n"]);return C=function(){return n},n}function y(n,e){return e||(e=n.slice(0)),n.raw=e,n}function F(n){for(var e=n.length,t=n.raw,r=e-1,o="",u=arguments.length,i=new Array(u>1?u-1:0),a=1;a<u;a++)i[a-1]=arguments[a];for(var c=0;c<e;++c)o+=t[c].replace(/(^|[^\\])\/\/.*$/gm,"$1").replace(/\s+/g,""),c<r&&(o+=i[c].source);return new RegExp(o)}e.Char=F(C()),e.CharData=F(b()),e.NameStartChar=F(g()),e.NameChar=F(x(),e.NameStartChar),e.Name=F(m(),e.NameStartChar,e.NameChar),e.Reference=F(h()),e.S=F(v()),e.Eq=F(d(),e.S,e.S),e.Attribute=F(p(),e.Name,e.Eq),e.CDSect=F(s(),e.Char),e.EmptyElemTag=F(l(),e.Name,e.S,e.Attribute,e.S),e.ETag=F(f(),e.Name,e.S),e.STag=F(c(),e.Name,e.S,e.Attribute,e.S),e.CharOnly=F(a(),e.Char),e.Comment=F(i(),e.Char,e.Char),e.doctypedecl=F(u(),e.S,e.S),e.PI=F(o(),e.Name,e.S,e.Char),e.XMLDecl=F(r(),e.S),e.Anchored={},e.Global={},Object.keys(e).forEach((function(n){if("Anchored"!==n&&"CharOnly"!==n&&"Global"!==n){var t=e[n];e.Anchored[n]=new RegExp("^"+t.source),e.Global[n]=new RegExp(t.source,"g")}}))}])}));
!function(e,t){"object"==typeof exports&&"object"==typeof module?module.exports=t():"function"==typeof define&&define.amd?define([],t):"object"==typeof exports?exports["parse-xml"]=t():e.parseXml=t()}("undefined"==typeof self?this:self,(function(){return(()=>{"use strict";var e={138:(e,t,n)=>{const s=n(567),r=n(12),i=n(190),o=n(315),c=n(914),a=n(526),h=n(141),u=n(684);function l(e,t){return new s(e,t).document}l.XmlCdata=r,l.XmlComment=i,l.XmlDocument=o,l.XmlElement=c,l.XmlNode=a,l.XmlProcessingInstruction=h,l.XmlText=u,e.exports=l},567:(e,t,n)=>{const s=n(444),r=n(531),i=n(12),o=n(190),c=n(315),a=n(914),h=n(141),u=n(684);e.exports=class{constructor(e,t={}){for(this.document=new c,this.currentNode=this.document,this.options=t,this.scanner=new s(function(e){return"\ufeff"===e[0]&&(e=e.slice(1)),e.replace(/\r\n?/g,"\n")}(e)),this.consumeProlog(),this.consumeElement()||this.error("Root element is missing or invalid");this.consumeMisc(););this.scanner.isEnd||this.error("Extra content at the end of the document")}addNode(e){e.parent=this.currentNode,this.currentNode.children.push(e)}addText(e){let{children:t}=this.currentNode;if(t.length>0){let n=t[t.length-1];if(n instanceof u)return void(n.text+=e)}this.addNode(new u(e))}consumeAttributeValue(){let e,{scanner:t}=this,n=t.peek();if('"'!==n&&"'"!==n)return!1;t.advance();let s=!1,r="",i='"'===n?/[^"&<]+/y:/[^'&<]+/y;e:for(;!t.isEnd;)switch(e=t.consumeMatch(i),e&&(this.validateChars(e),r+=e.replace(/[\t\r\n]/g," ")),t.peek()){case n:s=!0;break e;case"&":r+=this.consumeReference();continue;case"<":this.error("Unescaped `<` is not allowed in an attribute value");break;case"":this.error("Unclosed attribute")}return s||this.error("Unclosed attribute"),t.advance(),r}consumeCdataSection(){let{scanner:e}=this;if(!e.consumeStringFast("<![CDATA["))return!1;let t=e.consumeUntilString("]]>");return this.validateChars(t),e.consumeStringFast("]]>")||this.error("Unclosed CDATA section"),this.options.preserveCdata?this.addNode(new i(t)):this.addText(t),!0}consumeCharData(){let{scanner:e}=this,t=e.consumeUntilMatch(/<|&|]]>/g);return!!t&&(this.validateChars(t),"]"===e.peek()&&"]]>"===e.peek(3)&&this.error("Element content may not contain the CDATA section close delimiter `]]>`"),this.addText(t),!0)}consumeComment(){let{scanner:e}=this;if(!e.consumeStringFast("\x3c!--"))return!1;let t=e.consumeUntilString("--");return this.validateChars(t),e.consumeStringFast("--\x3e")||("--"===e.peek(2)?this.error("The string `--` isn't allowed inside a comment"):this.error("Unclosed comment")),this.options.preserveComments&&this.addNode(new o(t.trim())),!0}consumeContentReference(){let e=this.consumeReference();return!!e&&(this.addText(e),!0)}consumeDoctypeDeclaration(){let{scanner:e}=this;return!(!e.consumeStringFast("<!DOCTYPE")||!this.consumeWhitespace()||(e.consumeMatch(/[^[>]+/y),e.consumeMatch(/\[[\s\S]+?\][\x20\t\r\n]*>/y)||e.consumeStringFast(">")||this.error("Unclosed doctype declaration"),0))}consumeElement(){let{scanner:e}=this,t=e.charIndex;if("<"!==e.peek())return!1;e.advance();let n=this.consumeName();if(!n)return e.reset(t),!1;let s=Object.create(null);for(;this.consumeWhitespace();){let e=this.consumeName();if(!e)continue;let t=this.consumeEqual()&&this.consumeAttributeValue();!1===t&&this.error("Attribute value expected"),e in s&&this.error(`Duplicate attribute: ${e}`),"xml:space"===e&&"default"!==t&&"preserve"!==t&&this.error('Value of the `xml:space` attribute must be "default" or "preserve"'),s[e]=t}if(this.options.sortAttributes){let e=Object.keys(s).sort(),t=Object.create(null);for(let n=0;n<e.length;++n){let r=e[n];t[r]=s[r]}s=t}let r=Boolean(e.consumeStringFast("/>")),i=new a(n,s);if(i.parent=this.currentNode,!r){for(e.consumeStringFast(">")||this.error(`Unclosed start tag for element \`${n}\``),this.currentNode=i,this.consumeCharData();this.consumeElement()||this.consumeContentReference()||this.consumeCdataSection()||this.consumeProcessingInstruction()||this.consumeComment();)this.consumeCharData();let t,s=e.charIndex;e.consumeStringFast("</")&&(t=this.consumeName())&&t===n||(e.reset(s),this.error(`Missing end tag for element ${n}`)),this.consumeWhitespace(),e.consumeStringFast(">")||this.error(`Unclosed end tag for element ${n}`),this.currentNode=i.parent}return this.addNode(i),!0}consumeEqual(){return this.consumeWhitespace(),!!this.scanner.consumeStringFast("=")&&(this.consumeWhitespace(),!0)}consumeMisc(){return this.consumeComment()||this.consumeProcessingInstruction()||this.consumeWhitespace()}consumeName(){return r.isNameStartChar(this.scanner.peek())?this.scanner.consumeMatchFn(r.isNameChar):""}consumeProcessingInstruction(){let{scanner:e}=this,t=e.charIndex;if(!e.consumeStringFast("<?"))return!1;let n=this.consumeName();if(n?"xml"===n.toLowerCase()&&(e.reset(t),this.error("XML declaration isn't allowed here")):this.error("Invalid processing instruction"),!this.consumeWhitespace()){if(e.consumeStringFast("?>"))return this.addNode(new h(n)),!0;this.error("Whitespace is required after a processing instruction name")}let s=e.consumeUntilString("?>");return this.validateChars(s),e.consumeStringFast("?>")||this.error("Unterminated processing instruction"),this.addNode(new h(n,s)),!0}consumeProlog(){let{scanner:e}=this,t=e.charIndex;for(this.consumeXmlDeclaration();this.consumeMisc(););if(this.consumeDoctypeDeclaration())for(;this.consumeMisc(););return t<e.charIndex}consumeReference(){let{scanner:e}=this;if("&"!==e.peek())return!1;e.advance();let t,n=e.consumeMatchFn(r.isReferenceChar);if(";"!==e.consume()&&this.error("Unterminated reference (a reference must end with `;`)"),"#"===n[0]){let e="x"===n[1]?parseInt(n.slice(2),16):parseInt(n.slice(1),10);isNaN(e)&&this.error("Invalid character reference"),t=String.fromCodePoint(e),r.isXmlChar(t)||this.error("Character reference resolves to an invalid character")}else if(t=r.predefinedEntities[n],void 0===t){let{ignoreUndefinedEntities:t,resolveUndefinedEntity:s}=this.options,r=`&${n};`;if(s){let e=s(r);if(null!=e){let t=typeof e;if("string"!==t)throw new TypeError(`\`resolveUndefinedEntity()\` must return a string, \`null\`, or \`undefined\`, but returned a value of type ${t}`);return e}}if(t)return r;e.reset(-r.length),this.error(`Named entity isn't defined: ${r}`)}return t}consumeSystemLiteral(){let{scanner:e}=this,t=e.consumeStringFast('"')||e.consumeStringFast("'");if(!t)return!1;let n=e.consumeUntilString(t);return this.validateChars(n),e.consumeStringFast(t)||this.error("Missing end quote"),n}consumeWhitespace(){return Boolean(this.scanner.consumeMatchFn(r.isWhitespace))}consumeXmlDeclaration(){let{scanner:e}=this;if(!e.consumeStringFast("<?xml"))return!1;this.consumeWhitespace()||this.error("Invalid XML declaration");let t=Boolean(e.consumeStringFast("version"))&&this.consumeEqual()&&this.consumeSystemLiteral();if(!1===t?this.error("XML version is missing or invalid"):/^1\.[0-9]+$/.test(t)||this.error("Invalid character in version number"),this.consumeWhitespace()){Boolean(e.consumeStringFast("encoding"))&&this.consumeEqual()&&this.consumeSystemLiteral()&&this.consumeWhitespace();let t=Boolean(e.consumeStringFast("standalone"))&&this.consumeEqual()&&this.consumeSystemLiteral();t&&("yes"!==t&&"no"!==t&&this.error('Only "yes" and "no" are permitted as values of `standalone`'),this.consumeWhitespace())}return e.consumeStringFast("?>")||this.error("Invalid or unclosed XML declaration"),!0}error(e){let{charIndex:t,string:n}=this.scanner,s=1,r="",i=1;for(let e=0;e<t;++e){let t=n[e];"\n"===t?(s=1,r="",i+=1):(s+=1,r+=t)}let o=n.indexOf("\n",t);r+=-1===o?n.slice(t):n.slice(t,o);let c=0;r.length>50&&(s<40?r=r.slice(0,50):(c=s-20,r=r.slice(c,s+30)));let a=new Error(`${e} (line ${i}, column ${s})\n ${r}\n`+" ".repeat(s-c+1)+"^\n");throw Object.assign(a,{column:s,excerpt:r,line:i,pos:t}),a}validateChars(e){let t=0;for(let n of e)r.isNotXmlChar(n)&&(this.scanner.reset(-([...e].length-t)),this.error("Invalid character")),t+=1}}},444:e=>{const t="";e.exports=class{constructor(e){this.chars=[...e],this.charCount=this.chars.length,this.charIndex=0,this.charsToBytes=new Array(this.charCount),this.multiByteMode=!1,this.string=e;let{chars:t,charCount:n,charsToBytes:s}=this;if(n===e.length)for(let e=0;e<n;++e)s[e]=e;else{for(let e=0,r=0;r<n;++r)s[r]=e,e+=t[r].length;this.multiByteMode=!0}}get isEnd(){return this.charIndex>=this.charCount}_charLength(e){let{length:t}=e;return t<2||!this.multiByteMode?t:e.replace(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g,"_").length}advance(e=1){this.charIndex=Math.min(this.charCount,this.charIndex+e)}consume(e=1){let t=this.peek(e);return this.advance(e),t}consumeMatch(e){if(!e.sticky)throw new Error('`regex` must have a sticky flag ("y")');e.lastIndex=this.charsToBytes[this.charIndex];let n=e.exec(this.string);if(null===n)return t;let s=n[0];return this.advance(this._charLength(s)),s}consumeMatchFn(e){let n=this.charIndex;for(;!this.isEnd&&e(this.peek());)this.advance();return this.charIndex>n?this.string.slice(this.charsToBytes[n],this.charsToBytes[this.charIndex]):t}consumeString(e){if(this.consumeStringFast(e))return e;if(!this.multiByteMode)return t;let{length:n}=e,s=this._charLength(e);return s!==n&&e===this.peek(s)?(this.advance(s),e):t}consumeStringFast(e){if(this.peek()===e[0]){let{length:t}=e;if(1===t)return this.advance(),e;if(this.peek(t)===e)return this.advance(t),e}return t}consumeUntilMatch(e){if(!e.global)throw new Error('`regex` must have a global flag ("g")');let n=this.charsToBytes[this.charIndex];e.lastIndex=n;let s=e.exec(this.string);if(null===s||s.index===n)return t;let r=this.string.slice(n,s.index);return this.advance(this._charLength(r)),r}consumeUntilString(e){let{charIndex:n,charsToBytes:s,string:r}=this,i=s[n],o=r.indexOf(e,i);if(o<=0)return t;let c=r.slice(i,o);return this.advance(this._charLength(c)),c}peek(e=1){if(this.charIndex>=this.charCount)return t;if(1===e)return this.chars[this.charIndex];let{charsToBytes:n,charIndex:s}=this;return this.string.slice(n[s],n[s+e])}reset(e=0){this.charIndex=e>=0?Math.min(this.charCount,e):Math.max(0,this.charIndex+e)}}},12:(e,t,n)=>{const s=n(526),r=n(684);e.exports=class extends r{get type(){return s.TYPE_CDATA}}},190:(e,t,n)=>{const s=n(526);e.exports=class extends s{constructor(e=""){super(),this.content=e}get type(){return s.TYPE_COMMENT}toJSON(){return Object.assign(s.prototype.toJSON.call(this),{content:this.content})}}},315:(e,t,n)=>{const s=n(914),r=n(526);e.exports=class extends r{constructor(e=[]){super(),this.children=e}get document(){return this}get root(){return this.children.find((e=>e instanceof s))||null}get text(){return this.children.map((e=>"text"in e?e.text:"")).join("")}get type(){return r.TYPE_DOCUMENT}toJSON(){return Object.assign(r.prototype.toJSON.call(this),{children:this.children.map((e=>e.toJSON()))})}}},914:(e,t,n)=>{const s=n(526);class r extends s{constructor(e,t=Object.create(null),n=[]){super(),this.name=e,this.attributes=t,this.children=n}get isEmpty(){return 0===this.children.length}get preserveWhitespace(){let e=this;for(;e instanceof r;){if("xml:space"in e.attributes)return"preserve"===e.attributes["xml:space"];e=e.parent}return!1}get text(){return this.children.map((e=>"text"in e?e.text:"")).join("")}get type(){return s.TYPE_ELEMENT}toJSON(){return Object.assign(s.prototype.toJSON.call(this),{name:this.name,attributes:this.attributes,children:this.children.map((e=>e.toJSON()))})}}e.exports=r},526:e=>{class t{constructor(){this.parent=null}get document(){return this.parent?this.parent.document:null}get isRootNode(){return!!this.parent&&this.parent===this.document}get preserveWhitespace(){return Boolean(this.parent&&this.parent.preserveWhitespace)}get type(){return""}toJSON(){let e={type:this.type};return this.isRootNode&&(e.isRootNode=!0),this.preserveWhitespace&&(e.preserveWhitespace=!0),e}}t.TYPE_CDATA="cdata",t.TYPE_COMMENT="comment",t.TYPE_DOCUMENT="document",t.TYPE_ELEMENT="element",t.TYPE_PROCESSING_INSTRUCTION="pi",t.TYPE_TEXT="text",e.exports=t},141:(e,t,n)=>{const s=n(526);e.exports=class extends s{constructor(e,t=""){super(),this.name=e,this.content=t}get type(){return s.TYPE_PROCESSING_INSTRUCTION}toJSON(){return Object.assign(s.prototype.toJSON.call(this),{name:this.name,content:this.content})}}},684:(e,t,n)=>{const s=n(526);e.exports=class extends s{constructor(e=""){super(),this.text=e}get type(){return s.TYPE_TEXT}toJSON(){return Object.assign(s.prototype.toJSON.call(this),{text:this.text})}}},531:(e,t)=>{const n=Object.freeze(Object.assign(Object.create(null),{amp:"&",apos:"'",gt:">",lt:"<",quot:'"'}));function s(e){if(r(e))return!0;let t=o(e);return 45===t||46===t||t>=48&&t<=57||183===t||t>=768&&t<=879||t>=8255&&t<=8256}function r(e){let t=o(e);return 58===t||95===t||t>=65&&t<=90||t>=97&&t<=122||t>=192&&t<=214||t>=216&&t<=246||t>=248&&t<=767||t>=880&&t<=893||t>=895&&t<=8191||t>=8204&&t<=8205||t>=8304&&t<=8591||t>=11264&&t<=12271||t>=12289&&t<=55295||t>=63744&&t<=64975||t>=65008&&t<=65533||t>=65536&&t<=983039}function i(e){let t=o(e);return 9===t||10===t||13===t||t>=32&&t<=55295||t>=57344&&t<=65533||t>=65536&&t<=1114111}function o(e){return e.codePointAt(0)||-1}t.predefinedEntities=n,t.isNameChar=s,t.isNameStartChar=r,t.isNotXmlChar=function(e){return!i(e)},t.isReferenceChar=function(e){return"#"===e||s(e)},t.isWhitespace=function(e){let t=o(e);return 32===t||9===t||10===t||13===t},t.isXmlChar=i}},t={};return function n(s){if(t[s])return t[s].exports;var r=t[s]={exports:{}};return e[s](r,r.exports,n),r.exports}(138)})()}));
//# sourceMappingURL=parse-xml.min.js.map
{
"name": "@rgrove/parse-xml",
"version": "2.0.4",
"version": "3.0.0",
"description": "A fast, safe, compliant XML parser for Node.js and browsers.",
"keywords": [
"xml",
"xml parser",
"parse-xml",
"parse xml",
"parse",

@@ -19,3 +22,3 @@ "parser"

"engines": {
"node": ">=6.0.0"
"node": ">=12.0.0"
},

@@ -27,28 +30,31 @@ "files": [

],
"browser": "dist/commonjs/index.js",
"main": "src/index.js",
"main": "./src/index.js",
"types": "./dist/types/index.d.ts",
"scripts": {
"build": "babel src -d dist/commonjs && webpack",
"clean": "rm -rf .nyc_output coverage dist",
"build": "webpack && npm run build:types && npm run build:docs",
"build:docs": "documentation readme --quiet --access public --config documentation.yml --readme-file API.md --section '@rgrove/parse-xml API Documentation' src",
"build:types": "tsc --declaration --declarationMap --emitDeclarationOnly --declarationDir dist/types",
"clean": "rm -rf .eslintcache .nyc_output coverage dist",
"coverage": "nyc --reporter html --report-dir coverage npm test && open coverage/index.html",
"lint": "eslint --cache {src,tests}",
"lint": "eslint --cache src tests",
"prepublishOnly": "npm run clean && npm run build",
"test": "nyc --check-coverage --lines 100 mocha tests/*.test.js --delay --reporter dot",
"test:browser": "(sleep 5 && open 'http://localhost:8080/tests/browser/') & webpack-dev-server --config tests/webpack.config.js --watch"
"test": "nyc --check-coverage --branches 100 --functions 100 --lines 100 --statements 100 mocha tests/{**,}/*.test.js --delay --reporter dot",
"test:browser": "(sleep 5 && open 'http://localhost:8080/tests/browser/') & webpack serve --config tests/webpack.config.js --watch"
},
"devDependencies": {
"@babel/cli": "^7.8.4",
"@babel/core": "^7.9.6",
"@babel/preset-env": "^7.9.6",
"@rgrove/eslint-config": "^1.5.0",
"@rgrove/eslint-config": "^2.0.0",
"assert": "^2.0.0",
"async": "^3.2.0",
"babel-loader": "^8.1.0",
"eslint": "^6.8.0",
"mocha": "^7.1.2",
"nyc": "^15.0.1",
"webpack": "^4.43.0",
"webpack-cli": "^3.3.11",
"webpack-dev-server": "^3.10.3"
"documentation": "^13.1.0",
"eslint": "^7.18.0",
"mocha": "^8.2.1",
"nyc": "^15.1.0",
"path-browserify": "^1.0.1",
"process": "^0.11.10",
"typescript": "^4.1.3",
"webpack": "^5.15.0",
"webpack-cli": "^4.3.1",
"webpack-dev-server": "^3.11.2"
},
"dependencies": {}
}

@@ -6,4 +6,4 @@ # parse-xml

[![npm version](https://badge.fury.io/js/%40rgrove%2Fparse-xml.svg)](https://badge.fury.io/js/%40rgrove%2Fparse-xml)
[![Build Status](https://travis-ci.org/rgrove/parse-xml.svg?branch=master)](https://travis-ci.org/rgrove/parse-xml)
[![Bundle size](https://badgen.net/bundlephobia/minzip/@rgrove/parse-xml)](https://bundlephobia.com/result?p=@rgrove/parse-xml)
[![Test & Lint](https://github.com/rgrove/parse-xml/workflows/Test%20&%20Lint/badge.svg)](https://github.com/rgrove/parse-xml/actions?query=workflow%3A%22Test+%26+Lint%22)

@@ -15,12 +15,6 @@ ## Contents

- [Not Features](#not-features)
- [API](#api)
- [Examples](#examples)
- [Basic Usage](#basic-usage)
- [Friendly Errors](#friendly-errors)
- [API](#api)
- [Nodes](#nodes)
- [`cdata`](#cdata)
- [`comment`](#comment)
- [`document`](#document)
- [`element`](#element)
- [`text`](#text)
- [Why another XML parser?](#why-another-xml-parser)

@@ -37,3 +31,3 @@ - [Benchmark](#benchmark)

Or, if you like living dangerously, you can load [the minified UMD bundle][umd]
in a browser via [Unpkg][] and use the `parseXml` global.
in a browser via [Unpkg] and use the `parseXml` global.

@@ -47,5 +41,3 @@ [umd]:https://unpkg.com/@rgrove/parse-xml/dist/umd/parse-xml.min.js

- Works great in Node.js 8+ and in modern browsers. Also works in older
browsers if you provide polyfills for `Object.assign()`, `Object.freeze()`,
and `String.fromCodePoint()`.
- Works great in Node.js and in modern browsers.

@@ -60,3 +52,3 @@ - Provides [helpful, detailed error messages](#friendly-errors) with context

- It's [fast](#benchmark), [tiny](https://bundlephobia.com/result?p=@rgrove/parse-xml),
- It's [fast](#benchmark), [small](https://bundlephobia.com/result?p=@rgrove/parse-xml),
and has no dependencies.

@@ -66,15 +58,12 @@

This parser is not a complete implementation of the XML specification because
parts of the spec aren't very useful or aren't safe when the XML being parsed
comes from an untrusted source. However, those parts of XML that _are_
implemented behave as defined in the spec.
This parser currently discards document type declarations (`<!DOCTYPE ... >`)
and all their contents, because they're rarely useful and some of their features
aren't safe when the XML being parsed comes from an untrusted source.
The following XML features are ignored by the parser and are not exposed in the
document tree:
In addition, the only supported character encoding is UTF-8 because it's not
feasible (or useful) to suppport other character encodings in JavaScript.
- XML declarations
- Document type definitions
- Processing instructions
## API
In addition, the only supported character encoding is UTF-8.
See [API.md](API.md) for complete API docs.

@@ -87,23 +76,27 @@ ## Examples

const parseXml = require('@rgrove/parse-xml');
parseXml('<kittens fuzzy="yes">I like fuzzy kittens.</kittens>');
let doc = parseXml('<kittens fuzzy="yes">I like fuzzy kittens.</kittens>');
```
**Output**
The result is an [`XmlDocument`] instance containing the parsed document, with a
structure that looks like this (some properties and methods are excluded for
clarity; see the [API docs](API.md) for details):
```js
{
type: "document",
type: 'document',
children: [
{
type: "element",
name: "kittens",
type: 'element',
name: 'kittens',
attributes: {
fuzzy: "yes"
fuzzy: 'yes'
},
children: [
{
type: "text",
text: "I like fuzzy kittens."
type: 'text',
text: 'I like fuzzy kittens.'
}
]
],
parent: { ... },
isRootNode: true
}

@@ -114,2 +107,4 @@ ]

[`XmlDocument`]:API.md#xmldocument
### Friendly Errors

@@ -151,227 +146,2 @@

## API
### `parseXml(xml: string, options?: object) => object`
Parses an XML document and returns an object tree.
#### Options
The following options may be provided as properties of the `options` argument:
- **ignoreUndefinedEntities** _Boolean_ (default: `false`)
When `true`, an undefined named entity like `&bogus;` will be left as is
instead of causing a parse error.
- **preserveCdata** _Boolean_ (default: `false`)
When `true`, CDATA sections will be preserved in the document tree as nodes
of type `cdata`. Otherwise CDATA sections will be represented as nodes of
type `text`.
- **preserveComments** _Boolean_ (default: `false`)
When `true`, comments will be preserved in the document tree as nodes of
type `comment`. Otherwise comments will not be included in the document
tree.
- **resolveUndefinedEntity** _Function_
When an undefined named entity is encountered, this function will be called
with the entity as its only argument. It should return a string value with
which to replace the entity, or `null` or `undefined` to treat the entity as
undefined (which may result in a parse error depending on the value of
`ignoreUndefinedEntities`).
## Nodes
An XML document is parsed into a tree of node objects. Each node has the
following common properties:
- **parent** _Object?_
Reference to this node's parent node, or `null` if this node is the
`document` node (which has no parent).
- **type** _String_
Node type.
Each node also has a `toJSON()` method that returns a serializable
representation of the node without the `parent` property (in order to avoid
circular references). This means you can safely pass any node to
`JSON.stringify()` to serialize it and its children as JSON.
### `cdata`
A CDATA section. Only emitted when the `preserveCdata` option is `true` (by
default, CDATA sections become `text` nodes).
#### Properties
- **text** _String_
Unescaped text content of the CDATA section.
#### Example
```xml
<![CDATA[kittens are fuzzy & cute]]>
```
```js
{
type: "cdata",
text: "kittens are fuzzy & cute",
parent: { ... }
}
```
### `comment`
A comment. Only emitted when the `preserveComments` option is `true`.
#### Properties
- **content** _String_
Comment text.
#### Example
```xml
<!-- I'm a comment! -->
```
```js
{
type: "comment",
content: "I'm a comment!",
parent: { ... }
}
```
### `document`
The top-level node of an XML document.
#### Properties
- **children** _Object[]_
Array of child nodes.
#### Example
```xml
<root />
```
```js
{
type: "document",
children: [
{
type: "element",
name: "root",
attributes: {},
children: [],
parent: { ... }
}
],
parent: null
}
```
### `element`
An element.
Note that since parse-xml doesn't implement [XML Namespaces](https://www.w3.org/TR/REC-xml-names/),
no special treatment is given to namespace prefixes in element and attribute
names.
In other words, `<foo:bar foo:baz="quux" />` will result in the element name
"foo:bar" and the attribute name "foo:baz".
#### Properties
- **attributes** _Object_
Hash of attribute names to values.
Attribute names in this object are always in alphabetical order regardless
of their order in the document, and values are normalized and unescaped.
Values are always strings.
- **children** _Object[]_
Array of child nodes.
- **name** _String_
Name of the element as given in the start and/or end tags.
- **preserveWhitespace** _Boolean?_
This property will be set to `true` if the special
[`xml:space`](https://www.w3.org/TR/2008/REC-xml-20081126/#sec-white-space)
attribute on this element or on the closest parent with an `xml:space`
attribute has the value "preserve". This indicates that whitespace in the
text content of this element should be preserved rather than normalized.
If neither this element nor any of its ancestors has an `xml:space`
attribute set to "preserve", or if the closest `xml:space` attribute is set
to "default", this property will not be defined.
#### Example
```xml
<kittens description="fuzzy &amp; cute">I &lt;3 kittens</kittens>
```
```js
{
type: "element",
name: "kittens",
attributes: {
description: "fuzzy & cute"
},
children: [
{
type: "text",
text: "I <3 kittens",
parent: { ... }
}
],
parent: { ... }
}
```
### `text`
Text content inside an element.
#### Properties
- **text** _String_
Unescaped text content.
#### Example
```xml
kittens are fuzzy &amp; cute
```
```js
{
type: "text"
text: "kittens are fuzzy & cute",
parent: { ... }
}
```
## Why another XML parser?

@@ -384,13 +154,12 @@

- Loose, non-standard, "works for me" parsing behavior that can lead to
unexpected or even unsafe results when given input the author didn't
anticipate.
- Loose, non-standard parsing behavior that can lead to unexpected or even
unsafe results when given input the author didn't anticipate.
- Kitchen sink APIs that tightly couple a parser with DOM manipulation
functions, a stringifier, or other tooling that isn't directly related to
parsing.
parsing and consuming XML.
- Stream-based parsing. This is great in the rare case that you need to parse
truly enormous documents, but can be a pain to work with when all you want
is an object tree.
is a node tree.

@@ -401,5 +170,5 @@ - Poor error handling.

parse-xml's goal is to be a small, fast, safe, reasonably compliant,
non-streaming, non-validating, browser-friendly parser, because I think this is
an under-served niche.
parse-xml's goal is to be a small, fast, safe, compliant, non-streaming,
non-validating, browser-friendly parser, because I think this is an under-served
niche.

@@ -423,48 +192,51 @@ I think parse-xml demonstrates that it's not necessary to jettison the spec

```
Node.js v12.16.3 / Darwin x64
Node.js v14.15.4 / Darwin x64
Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
Running "Small document (291 bytes)" suite...
Progress: 100%
@rgrove/parse-xml 2.0.4:
74 904 ops/s, ±0.59% | fastest
@rgrove/parse-xml 3.0.0:
77 109 ops/s, ±0.46% | fastest
libxmljs2 0.25.3 (native):
29 890 ops/s, ±4.15% | 60.1% slower
libxmljs2 0.26.6 (native):
29 480 ops/s, ±4.62% | slowest, 61.77% slower
xmldoc 1.1.2 (sax-js):
26 659 ops/s, ±0.67% | slowest, 64.41% slower
36 035 ops/s, ±0.62% | 53.27% slower
Finished 3 cases!
Fastest: @rgrove/parse-xml 2.0.4
Slowest: xmldoc 1.1.2 (sax-js)
Fastest: @rgrove/parse-xml 3.0.0
Slowest: libxmljs2 0.26.6 (native)
Running "Medium document (72081 bytes)" suite...
Progress: 100%
@rgrove/parse-xml 2.0.4:
455 ops/s, ±0.41% | 53.76% slower
@rgrove/parse-xml 3.0.0:
321 ops/s, ±0.99% | 54.34% slower
libxmljs2 0.25.3 (native):
984 ops/s, ±6.42% | fastest
libxmljs2 0.26.6 (native):
703 ops/s, ±10.64% | fastest
xmldoc 1.1.2 (sax-js):
184 ops/s, ±0.75% | slowest, 81.3% slower
235 ops/s, ±0.50% | slowest, 66.57% slower
Finished 3 cases!
Fastest: libxmljs2 0.25.3 (native)
Fastest: libxmljs2 0.26.6 (native)
Slowest: xmldoc 1.1.2 (sax-js)
Running "Large document (1162464 bytes)" suite...
Progress: 100%
@rgrove/parse-xml 2.0.4:
36 ops/s, ±1.68% | 41.94% slower
@rgrove/parse-xml 3.0.0:
20 ops/s, ±0.48% | 72.97% slower
libxmljs2 0.25.3 (native):
62 ops/s, ±13.04% | fastest
libxmljs2 0.26.6 (native):
74 ops/s, ±12.02% | fastest
xmldoc 1.1.2 (sax-js):
15 ops/s, ±0.67% | slowest, 75.81% slower
19 ops/s, ±1.68% | slowest, 74.32% slower
Finished 3 cases!
Fastest: libxmljs2 0.25.3 (native)
Fastest: libxmljs2 0.26.6 (native)
Slowest: xmldoc 1.1.2 (sax-js)

@@ -471,0 +243,0 @@ ```

'use strict';
const emptyArray = Object.freeze([]);
const emptyObject = Object.freeze(Object.create(null));
const Parser = require('./lib/Parser');
const XmlCdata = require('./lib/XmlCdata');
const XmlComment = require('./lib/XmlComment');
const XmlDocument = require('./lib/XmlDocument');
const XmlElement = require('./lib/XmlElement');
const XmlNode = require('./lib/XmlNode');
const XmlProcessingInstruction = require('./lib/XmlProcessingInstruction');
const XmlText = require('./lib/XmlText');
const namedEntities = Object.freeze({
'&amp;': '&',
'&apos;': "'",
'&gt;': '>',
'&lt;': '<',
'&quot;': '"'
});
/**
Parses the given XML string and returns an `XmlDocument` instance representing
the document tree.
const NODE_TYPE_CDATA = 'cdata';
const NODE_TYPE_COMMENT = 'comment';
const NODE_TYPE_DOCUMENT = 'document';
const NODE_TYPE_ELEMENT = 'element';
const NODE_TYPE_TEXT = 'text';
@example
let Syntax;
const parseXml = require('@rgrove/parse-xml');
let doc = parseXml('<kittens fuzzy="yes">I like fuzzy kittens.</kittens>');
module.exports = function parseXml(xml, options = emptyObject) {
if (Syntax === void 0) {
// Lazy require to defer regex parsing until first use.
Syntax = require('./lib/syntax');
}
@param {string} xml
XML string to parse.
if (xml[0] === '\uFEFF') {
// Strip byte order mark.
xml = xml.slice(1);
}
@param {object} [options]
Parsing options.
xml = xml.replace(/\r\n?/g, '\n'); // Normalize CRLF and CR to LF.
@param {boolean} [options.ignoreUndefinedEntities=false]
When `true`, an undefined named entity (like "&bogus;") will be left in the
output as is instead of causing a parse error.
let doc = {
type: NODE_TYPE_DOCUMENT,
children: [],
parent: null,
toJSON: nodeToJson
};
@param {boolean} [options.preserveCdata=false]
When `true`, CDATA sections will be preserved in the document as `XmlCdata`
nodes. Otherwise CDATA sections will be represented as `XmlText` nodes,
which keeps the node tree simpler and easier to work with.
let state = {
length: xml.length,
options,
parent: doc,
pos: 0,
prevPos: 0,
xml
};
@param {boolean} [options.preserveComments=false]
When `true`, comments will be preserved in the document as `XmlComment`
nodes. Otherwise comments will not be included in the node tree.
state.replaceReference = replaceReference.bind(state);
@param {(entity: string) => string?} [options.resolveUndefinedEntity]
When an undefined named entity is encountered, this function will be called
with the entity as its only argument. It should return a string value with
which to replace the entity, or `null` or `undefined` to treat the entity as
undefined (which may result in a parse error depending on the value of
`ignoreUndefinedEntities`).
consumeProlog(state);
@param {boolean} [options.sortAttributes=false]
When `true`, attributes in an element's `attributes` object will be sorted
in alphanumeric order by name. Otherwise they'll retain their original order
as found in the XML.
if (!consumeElement(state)) {
error(state, 'Root element is missing or invalid');
}
while (consumeMisc(state)) {} // eslint-disable-line no-empty
if (!isEof(state)) {
error(state, `Extra content at the end of the document`);
}
return doc;
};
// -- Private Functions --------------------------------------------------------
function addNode(state, node) {
node.parent = state.parent;
node.toJSON = nodeToJson;
state.parent.children.push(node);
@returns {XmlDocument}
@public
*/
function parseXml(xml, options) {
return (new Parser(xml, options)).document;
}
function addText(state, text) {
let { children } = state.parent;
let prevNode = children[children.length - 1];
parseXml.XmlCdata = XmlCdata;
parseXml.XmlComment = XmlComment;
parseXml.XmlDocument = XmlDocument;
parseXml.XmlElement = XmlElement;
parseXml.XmlNode = XmlNode;
parseXml.XmlProcessingInstruction = XmlProcessingInstruction;
parseXml.XmlText = XmlText;
if (prevNode !== void 0 && prevNode.type === NODE_TYPE_TEXT) {
// The previous node is a text node, so we can append to it and avoid
// creating another node.
prevNode.text += text;
} else {
addNode(state, {
type: NODE_TYPE_TEXT,
text
});
}
}
// Each `consume*` function takes the current state as an argument and returns
// `true` if `state.pos` was advanced (meaning some XML was consumed) or `false`
// if nothing was consumed.
function consumeCDSect(state) {
let [ match, text ] = scan(state, Syntax.Anchored.CDSect);
if (match === void 0) {
return false;
}
if (state.options.preserveCdata) {
addNode(state, {
type: NODE_TYPE_CDATA,
text
});
} else {
addText(state, text);
}
return true;
}
function consumeCharData(state) {
let [ text ] = scan(state, Syntax.Anchored.CharData);
if (text === void 0) {
return false;
}
let cdataCloseIndex = text.indexOf(']]>');
if (cdataCloseIndex !== -1) {
state.pos = state.prevPos + cdataCloseIndex;
error(state, 'Element content may not contain the CDATA section close delimiter `]]>`');
}
// Note: XML 1.0 5th ed. says `CharData` is "any string of characters which
// does not contain the start-delimiter of any markup and does not include the
// CDATA-section-close delimiter", but the conformance test suite and
// well-established parsers like libxml seem to restrict `CharData` to
// characters that match the `Char` symbol, so that's what I've done here.
if (!Syntax.CharOnly.test(text)) {
state.pos = state.prevPos + text.search(new RegExp(`(?!${Syntax.Char.source})`));
error(state, 'Element content contains an invalid character');
}
addText(state, text);
return true;
}
function consumeComment(state) {
let [ , content ] = scan(state, Syntax.Anchored.Comment);
if (content === void 0) {
return false;
}
if (state.options.preserveComments) {
addNode(state, {
type: NODE_TYPE_COMMENT,
content: content.trim()
});
}
return true;
}
function consumeDoctypeDecl(state) {
return scan(state, Syntax.Anchored.doctypedecl).length > 0;
}
function consumeElement(state) {
let [ tag, name, attrs ] = scan(state, Syntax.Anchored.EmptyElemTag);
let isEmpty = tag !== void 0;
if (!isEmpty) {
[ tag, name, attrs ] = scan(state, Syntax.Anchored.STag);
if (tag === void 0) {
return false;
}
}
let { parent } = state;
let parsedAttrs = parseAttrs(state, attrs);
let node = {
type: NODE_TYPE_ELEMENT,
name,
attributes: parsedAttrs,
children: []
};
let xmlSpace = parsedAttrs['xml:space'];
if (xmlSpace === 'preserve'
|| (xmlSpace !== 'default' && parent.preserveWhitespace)) {
node.preserveWhitespace = true;
}
if (!isEmpty) {
state.parent = node;
consumeCharData(state);
while (
consumeElement(state)
|| consumeReference(state)
|| consumeCDSect(state)
|| consumePI(state)
|| consumeComment(state)
) {
consumeCharData(state);
}
let [ , endName ] = scan(state, Syntax.Anchored.ETag);
if (endName !== name) {
state.pos = state.prevPos;
error(state, `Missing end tag for element ${name}`);
}
state.parent = parent;
}
addNode(state, node);
return true;
}
function consumeMisc(state) {
return consumeComment(state)
|| consumePI(state)
|| consumeWhitespace(state);
}
function consumePI(state) {
let [ match, target ] = scan(state, Syntax.Anchored.PI);
if (match === void 0) {
return false;
}
if (target.toLowerCase() === 'xml') {
state.pos = state.prevPos;
error(state, 'XML declaration is only allowed at the start of the document');
}
return true;
}
function consumeProlog(state) {
let { pos } = state;
scan(state, Syntax.Anchored.XMLDecl);
while (consumeMisc(state)) {} // eslint-disable-line no-empty
if (consumeDoctypeDecl(state)) {
while (consumeMisc(state)) {} // eslint-disable-line no-empty
}
return state.pos > pos;
}
function consumeReference(state) {
let [ ref ] = scan(state, Syntax.Anchored.Reference);
if (ref === void 0) {
return false;
}
addText(state, state.replaceReference(ref));
return true;
}
function consumeWhitespace(state) {
return scan(state, Syntax.Anchored.S).length > 0;
}
function error(state, message) {
let { pos, xml } = state;
let column = 1;
let excerpt = '';
let line = 1;
// Find the line and column where the error occurred.
for (let i = 0; i < pos; ++i) {
let char = xml[i];
if (char === '\n') {
column = 1;
excerpt = '';
line += 1;
} else {
column += 1;
excerpt += char;
}
}
let eol = xml.indexOf('\n', pos);
excerpt += eol === -1
? xml.slice(pos)
: xml.slice(pos, eol);
let excerptStart = 0;
// Keep the excerpt below 50 chars, but always keep the error position in
// view.
if (excerpt.length > 50) {
if (column < 40) {
excerpt = excerpt.slice(0, 50);
} else {
excerptStart = column - 20;
excerpt = excerpt.slice(excerptStart, column + 30);
}
}
let err = new Error(
`${message} (line ${line}, column ${column})\n`
+ ` ${excerpt}\n`
+ ' '.repeat(column - excerptStart + 1) + '^\n'
);
err.column = column;
err.excerpt = excerpt;
err.line = line;
err.pos = pos;
throw err;
}
function isEof(state) {
return state.pos >= state.length - 1;
}
function nodeToJson() {
let json = Object.assign(Object.create(null), this); // eslint-disable-line no-invalid-this
delete json.parent;
return json;
}
function normalizeAttrValue(state, value) {
return value
.replace(/[\x20\t\r\n]/g, ' ')
.replace(Syntax.Global.Reference, state.replaceReference);
}
function parseAttrs(state, attrs) {
let parsedAttrs = Object.create(null);
if (!attrs) {
return parsedAttrs;
}
let attrPairs = attrs
.match(Syntax.Global.Attribute)
.sort();
for (let i = 0, len = attrPairs.length; i < len; ++i) {
let attrPair = attrPairs[i];
let eqMatch = attrPair.match(Syntax.Eq);
let name = attrPair.slice(0, eqMatch.index);
let value = attrPair.slice(eqMatch.index + eqMatch[0].length);
if (name in parsedAttrs) {
state.pos = state.prevPos;
error(state, `Attribute \`${name}\` redefined`);
}
value = normalizeAttrValue(state, value.slice(1, -1));
if (name === 'xml:space') {
if (value !== 'default' && value !== 'preserve') {
state.pos = state.prevPos;
error(state, `Value of the \`xml:space\` attribute must be "default" or "preserve"`);
}
}
parsedAttrs[name] = value;
}
return parsedAttrs;
}
function replaceReference(ref) {
let state = this; // eslint-disable-line no-invalid-this
if (ref[ref.length - 1] !== ';') {
error(state, `Invalid reference: \`${ref}\``);
}
if (ref[1] === '#') {
// This is a character entity.
let codePoint;
if (ref[2] === 'x') {
codePoint = parseInt(ref.slice(3, -1), 16);
} else {
codePoint = parseInt(ref.slice(2, -1), 10);
}
if (isNaN(codePoint)) {
state.pos = state.prevPos;
error(state, `Invalid character entity \`${ref}\``);
}
let char = String.fromCodePoint(codePoint);
if (!Syntax.Char.test(char)) {
state.pos = state.prevPos;
error(state, `Invalid character entity \`${ref}\``);
}
return char;
}
// This is a named entity.
let value = namedEntities[ref];
if (value !== void 0) {
return value;
}
if (state.options.resolveUndefinedEntity) {
let resolvedValue = state.options.resolveUndefinedEntity(ref);
if (resolvedValue !== null && resolvedValue !== void 0) {
return resolvedValue;
}
}
if (state.options.ignoreUndefinedEntities) {
return ref;
}
state.pos = state.prevPos;
error(state, `Named entity isn't defined: \`${ref}\``);
}
function scan(state, regex) {
let { pos, xml } = state;
let xmlToScan = pos > 0
? xml.slice(pos)
: xml;
let matches = xmlToScan.match(regex);
if (matches === null) {
return emptyArray;
}
state.prevPos = state.pos;
state.pos += matches[0].length;
return matches;
}
module.exports = parseXml;
'use strict';
// To improve readability, the regular expression patterns in this file are
// written as tagged template literals. The `regex` tag function strips literal
// whitespace characters and line comments beginning with `//` and returns a
// RegExp instance.
//
// Escape sequences are preserved as-is in the resulting regex, so
// double-escaping isn't necessary. A pattern may embed another pattern using
// `${}` interpolation.
// -- Exported Constants -------------------------------------------------------
// -- Common Symbols -----------------------------------------------------------
exports.Char = regex`
(?:
[
\t
\n
\r
\x20-\uD7FF
\uE000-\uFFFD
]
/**
Mapping of predefined entity names to their replacement values.
|
@type {Readonly<{[name: string]: string}>}
@see https://www.w3.org/TR/2008/REC-xml-20081126/#sec-predefined-ent
*/
const predefinedEntities = Object.freeze(Object.assign(Object.create(null), {
amp: '&',
apos: "'",
gt: '>',
lt: '<',
quot: '"'
}));
[\uD800-\uDBFF][\uDC00-\uDFFF]
)
`;
exports.predefinedEntities = predefinedEntities;
// Partial implementation.
//
// To be compliant, the matched text must result in an error if it contains the
// string `]]>`, but that can't be easily represented here so we do it in the
// parser.
exports.CharData = regex`
[^<&]+
`;
// -- Exported Functions -------------------------------------------------------
exports.NameStartChar = regex`
(?:
[
:
A-Z
_
a-z
\xC0-\xD6
\xD8-\xF6
\xF8-\u02FF
\u0370-\u037D
\u037F-\u1FFF
\u200C-\u200D
\u2070-\u218F
\u2C00-\u2FEF
\u3001-\uD7FF
\uF900-\uFDCF
\uFDF0-\uFFFD
]
/**
Returns `true` if _char_ is an XML `NameChar`, `false` if it isn't.
|
@param {string} char
@returns {boolean}
@see https://www.w3.org/TR/2008/REC-xml-20081126/#NT-NameChar
*/
function isNameChar(char) {
if (isNameStartChar(char)) {
return true;
}
[\uD800-\uDB7F][\uDC00-\uDFFF]
)
`;
let cp = getCodePoint(char);
exports.NameChar = regex`
(?:
${exports.NameStartChar}
return cp === 0x2D // -
|| cp === 0x2E // .
|| (cp >= 0x30 && cp <= 0x39) // 0-9
|| cp === 0xB7
|| (cp >= 0x300 && cp <= 0x36F)
|| (cp >= 0x203F && cp <= 0x2040);
}
|
exports.isNameChar = isNameChar;
[
.
0-9
\xB7
\u0300-\u036F
\u203F-\u2040
-
]
)
`;
/**
Returns `true` if _char_ is an XML `NameStartChar`, `false` if it isn't.
exports.Name = regex`
${exports.NameStartChar}
(?:${exports.NameChar})*
`;
@param {string} char
@returns {boolean}
@see https://www.w3.org/TR/2008/REC-xml-20081126/#NT-NameStartChar
*/
function isNameStartChar(char) {
let cp = getCodePoint(char);
// Loose implementation. The entity will be validated in the `replaceReference`
// function.
exports.Reference = regex`
&[^\s&;]*;?
`;
return cp === 0x3A // :
|| cp === 0x5F // _
|| (cp >= 0x41 && cp <= 0x5A) // A-Z
|| (cp >= 0x61 && cp <= 0x7A) // a-z
|| (cp >= 0xC0 && cp <= 0xD6)
|| (cp >= 0xD8 && cp <= 0xF6)
|| (cp >= 0xF8 && cp <= 0x2FF)
|| (cp >= 0x370 && cp <= 0x37D)
|| (cp >= 0x37F && cp <= 0x1FFF)
|| (cp >= 0x200C && cp <= 0x200D)
|| (cp >= 0x2070 && cp <= 0x218F)
|| (cp >= 0x2C00 && cp <= 0x2FEF)
|| (cp >= 0x3001 && cp <= 0xD7FF)
|| (cp >= 0xF900 && cp <= 0xFDCF)
|| (cp >= 0xFDF0 && cp <= 0xFFFD)
|| (cp >= 0x10000 && cp <= 0xEFFFF);
}
exports.S = regex`
[\x20\t\r\n]+
`;
exports.isNameStartChar = isNameStartChar;
// -- Attributes ---------------------------------------------------------------
exports.Eq = regex`
(?:${exports.S})?
=
(?:${exports.S})?
`;
/**
Returns `true` if _char_ is not a valid XML `Char`, `false` otherwise.
exports.Attribute = regex`
${exports.Name}
${exports.Eq}
@param {string} char
@returns {boolean}
@see https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Char
*/
function isNotXmlChar(char) {
return !isXmlChar(char);
}
(?:
"(?:
[^<"]
)*"
exports.isNotXmlChar = isNotXmlChar;
|
/**
Returns `true` if _char_ is a valid reference character (which may appear
between `&` and `;` in a reference), `false` otherwise.
'(?:
[^<']
)*'
)
`;
@param {string} char
@returns {boolean}
@see https://www.w3.org/TR/2008/REC-xml-20081126/#sec-references
*/
function isReferenceChar(char) {
return char === '#' || isNameChar(char);
}
// -- Elements -----------------------------------------------------------------
exports.CDSect = regex`
<!\[CDATA\[
// Group 1: CData text content (optional)
(
(?:${exports.Char})*?
)
\]\]>
`;
exports.isReferenceChar = isReferenceChar;
exports.EmptyElemTag = regex`
<
// Group 1: Element name
(${exports.Name})
/**
Returns `true` if _char_ is an XML whitespace character, `false` otherwise.
// Group 2: Attributes (optional)
(
(?:
${exports.S}
${exports.Attribute}
)*
)
@param {string} char
@returns {boolean}
@see https://www.w3.org/TR/2008/REC-xml-20081126/#white
*/
function isWhitespace(char) {
let cp = getCodePoint(char);
(?:${exports.S})?
/>
`;
return cp === 0x20
|| cp === 0x9
|| cp === 0xA
|| cp === 0xD;
}
exports.ETag = regex`
</
// Group 1: End tag name
(${exports.Name})
(?:${exports.S})?
>
`;
exports.isWhitespace = isWhitespace;
exports.STag = regex`
<
// Group 1: Start tag name
(${exports.Name})
/**
Returns `true` if _char_ is a valid XML `Char`, `false` otherwise.
// Group 2: Attributes (optional)
(
(?:
${exports.S}
${exports.Attribute}
)*
)
@param {string} char
@returns {boolean}
@see https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Char
*/
function isXmlChar(char) {
let cp = getCodePoint(char);
(?:${exports.S})?
>
`;
return cp === 0x9
|| cp === 0xA
|| cp === 0xD
|| (cp >= 0x20 && cp <= 0xD7FF)
|| (cp >= 0xE000 && cp <= 0xFFFD)
|| (cp >= 0x10000 && cp <= 0x10FFFF);
}
// -- Misc ---------------------------------------------------------------------
exports.isXmlChar = isXmlChar;
// Special pattern that matches an entire string consisting only of `Char`
// characters.
exports.CharOnly = regex`
^(?:${exports.Char})*$
`;
// -- Private Functions --------------------------------------------------------
exports.Comment = regex`
<!--
// Group 1: Comment text (optional)
(
(?:
(?!-) ${exports.Char}
| - (?!-) ${exports.Char}
)*
)
-->
`;
/**
Returns the Unicode code point value of the given character, or `-1` if _char_
is empty.
// Loose implementation since doctype declarations are discarded.
//
// It's not possible to fully parse a doctype declaration with a regex, but
// since we just discard them we can skip parsing the fiddly inner bits and use
// a regex to speed things up.
exports.doctypedecl = regex`
<!DOCTYPE
${exports.S}
[^[>]*
(?:
\[ [\s\S]+? \]
(?:${exports.S})?
)?
>
`;
// Loose implementation since processing instructions are discarded.
exports.PI = regex`
<\?
// Group 1: PITarget
(
${exports.Name}
)
(?:
${exports.S}
(?:${exports.Char})*?
)?
\?>
`;
// Loose implementation since XML declarations are discarded.
exports.XMLDecl = regex`
<\?xml
${exports.S}
[\s\S]+?
\?>
`;
// -- Helpers ------------------------------------------------------------------
exports.Anchored = {};
exports.Global = {};
// Create anchored and global variations of each pattern.
Object.keys(exports).forEach(name => {
if (name !== 'Anchored' && name !== 'CharOnly' && name !== 'Global') {
let pattern = exports[name];
exports.Anchored[name] = new RegExp('^' + pattern.source);
exports.Global[name] = new RegExp(pattern.source, 'g');
}
});
function regex(strings, ...embeddedPatterns) {
let { length, raw } = strings;
let lastIndex = length - 1;
let pattern = '';
for (let i = 0; i < length; ++i) {
pattern += raw[i]
.replace(/(^|[^\\])\/\/.*$/gm, '$1') // remove end-of-line comments
.replace(/\s+/g, ''); // remove all whitespace
if (i < lastIndex) {
pattern += embeddedPatterns[i].source;
}
}
return new RegExp(pattern);
@param {string} char
@returns {number}
*/
function getCodePoint(char) {
return char.codePointAt(0) || -1;
}

Sorry, the diff of this file is not supported yet

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc