Socket
Socket
Sign inDemoInstall

saxes

Package Overview
Dependencies
1
Maintainers
1
Versions
30
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

Comparing version 3.1.11 to 4.0.0-rc.1

47

CHANGELOG.md

@@ -0,1 +1,48 @@

<a name="4.0.0-rc.1"></a>
# [4.0.0-rc.1](https://github.com/lddubeau/saxes/compare/v3.1.11...v4.0.0-rc.1) (2019-10-02)
### Bug Fixes
* don't serialize the fileName as undefined: when not present ([4ff2365](https://github.com/lddubeau/saxes/commit/4ff2365))
* fix bug with initial eol characters ([7b3db75](https://github.com/lddubeau/saxes/commit/7b3db75))
* handling of end of line characters ([f13247a](https://github.com/lddubeau/saxes/commit/f13247a))
### Features
* add forceXMLVersion ([1eedbf8](https://github.com/lddubeau/saxes/commit/1eedbf8))
* saxes handles chunks that "break" unicode ([1272448](https://github.com/lddubeau/saxes/commit/1272448))
* support for XML 1.1 ([36704fb](https://github.com/lddubeau/saxes/commit/36704fb))
### Performance Improvements
* don't depend on limit to know when we hit the end of buffer ([ad4ab53](https://github.com/lddubeau/saxes/commit/ad4ab53))
* don't increment a column number ([490fc24](https://github.com/lddubeau/saxes/commit/490fc24))
* don't repeatedly read this.i in the getCode methods ([d3f196c](https://github.com/lddubeau/saxes/commit/d3f196c))
* improve performance of text handling ([9c13099](https://github.com/lddubeau/saxes/commit/9c13099))
* make the most common path of getCode functions the shortest ([4d66bbb](https://github.com/lddubeau/saxes/commit/4d66bbb))
* minimine concatenation by adding the capability to unget codes ([27fa8b9](https://github.com/lddubeau/saxes/commit/27fa8b9))
* use isCharAndNotRestricted rather than call two functions ([f0b67a4](https://github.com/lddubeau/saxes/commit/f0b67a4))
* use slice rather than substring ([c1fed89](https://github.com/lddubeau/saxes/commit/c1fed89))
### BREAKING CHANGES
* previous versions of saxes did not consistently convert end of
line characters to NL (0xA) in the data reported by event handlers. This has
been fixed. If your code relied on the old (incorrect) behavior then you'll have
to update it.
* previous versions of saxes would parse files with an XML
declaration set to 1.1 as 1.0 documents. The support for 1.1 entails that if a
document has an XML declaration that specifies version 1.1 it is parsed as a 1.1
document.
* when ``fileName`` is undefined in the parser options saxes does
not show a file name in error messages. Previously it was showing the name
``undefined``. To get the previous behavior, in all cases where you'd leave
``fileName`` undefined, you must set it to the string ``"undefined"`` instead.
<a name="3.1.11"></a>

@@ -2,0 +49,0 @@ ## [3.1.11](https://github.com/lddubeau/saxes/compare/v3.1.10...v3.1.11) (2019-06-25)

14

lib/saxes.d.ts
declare namespace saxes {
export const EVENTS: ReadonlyArray<string>;
export interface SaxesOptions {
export interface CommonSaxesOptions {
xmlns?: boolean;

@@ -10,4 +10,16 @@ position?: boolean;

additionalNamespaces?: Record<string, string>;
defaultXMLVersion?: "1.0" | "1.1";
}
export interface NotForced extends CommonSaxesOptions {
forceXMLVersion?: false;
}
export interface Forced extends CommonSaxesOptions {
defaultXMLVersion: CommonSaxesOptions["defaultXMLVersion"];
forceXMLVersion: true;
}
export type SaxesOptions = NotForced | Forced;
export interface XMLDecl {

@@ -14,0 +26,0 @@ version?: string;

590

lib/saxes.js
"use strict";
const { isS, isChar, isNameStartChar, isNameChar, S_LIST, NAME_RE } =
require("xmlchars/xml/1.0/ed5");
const { isNCNameStartChar, isNCNameChar, NC_NAME_RE } = require("xmlchars/xmlns/1.0/ed3");
const {
isS, isChar: isChar10, isNameStartChar, isNameChar, S_LIST, NAME_RE,
} = require("xmlchars/xml/1.0/ed5");
const { isChar: isChar11 } = require("xmlchars/xml/1.1/ed2");
const { isNCNameStartChar, isNCNameChar, NC_NAME_RE } =
require("xmlchars/xmlns/1.0/ed3");

@@ -88,2 +91,3 @@ const XML_NAMESPACE = "http://www.w3.org/XML/1998/namespace";

const TAB = 9;
const NL = 0xA;

@@ -105,2 +109,4 @@ const CR = 0xD;

const CLOSE_BRACKET = 0x5D;
const NEL = 0x85;
const LS = 0x2028; // Line Separator

@@ -261,6 +267,14 @@ function isQuote(c) {

*
* @property {string} [fileName] A file name to use for error reporting. Leaving
* this unset will report a file name of "undefined". "File name" is a loose
* concept. You could use a URL to some resource, or any descriptive name you
* like.
* @property {string} [fileName] A file name to use for error reporting. "File
* name" is a loose concept. You could use a URL to some resource, or any
* descriptive name you like.
*
* @property {"1.0" | "1.1"} [defaultXMLVersion] The default XML version to
* use. If unspecified, and there is no XML encoding declaration, the default
* version is "1.0".
*
* @property {boolean} [forceXMLVersion] A flag indicating whether to force the
* XML version used for parsing to the value of ``defaultXMLVersion``. When this
* flag is ``true``, ``defaultXMLVersion`` must be specified. If unspecified,
* the default value of this flag is ``false``.
*/

@@ -326,3 +340,14 @@

this.i = 0;
this.trailingCR = false;
//
// We use prevI to allow "ungetting" the previously read code point. Note
// however, that it is not safe to unget everything and anything. In
// particular ungetting EOL characters will screw positioning up.
//
// Practically, you must not unget a code which has any side effect beyond
// updating ``this.i`` and ``this.prevI``. Only EOL codes have such side
// effects.
//
this.prevI = 0;
this.carriedFromPrevious = undefined;
this.originalNL = true;
this.forbiddenState = FORBIDDEN_START;

@@ -368,5 +393,4 @@ /**

this.processAttribs = this.processAttribsNS;
this.pushAttrib = this.pushAttribNS;
this.ns = Object.assign({ __proto__: null }, rootNS);
this.ns = { __proto__: null, ...rootNS };
const additional = this.opt.additionalNamespaces;

@@ -383,5 +407,14 @@ if (additional) {

this.processAttribs = this.processAttribsPlain;
this.pushAttrib = this.pushAttribPlain;
}
let { defaultXMLVersion } = this.opt;
const { forceXMLVersion } = this.opt;
if (defaultXMLVersion === undefined) {
if (forceXMLVersion) {
throw new Error("forceXMLVersion set but defaultXMLVersion is not set");
}
defaultXMLVersion = "1.0";
}
this.setXMLVersion(defaultXMLVersion);
this.trackPosition = this.opt.position !== false;

@@ -392,3 +425,3 @@ /** The line number the parser is currently looking at. */

/** The column the parser is currently looking at. */
this.column = 0;
this.positionAtNewLine = 0;

@@ -404,2 +437,6 @@ this.fileName = this.opt.fileName;

get column() {
return this.position - this.positionAtNewLine;
}
/* eslint-disable class-methods-use-this */

@@ -499,3 +536,3 @@ /**

*
* @param {Error} er The error to report.
* @param {string} er The error to report.
*

@@ -505,5 +542,13 @@ * @returns this

fail(er) {
const message = (this.trackPosition) ?
`${this.fileName}:${this.line}:${this.column}: ${er}` : er;
let message = this.fileName || "";
if (this.trackPosition) {
if (message.length > 0) {
message += ":";
}
message += `${this.line}:${this.column}`;
}
if (message.length > 0) {
message += ": ";
}
message += er;
this.onerror(new Error(message));

@@ -537,21 +582,25 @@ return this;

// of single complete characters (``Array.from(chunk)``) would be faster
// than the current repeated calls to ``codePointAt``. As of August 2018, it
// than the current repeated calls to ``charCodeAt``. As of August 2018, it
// isn't. (There may be Node-specific code that would perform faster than
// ``Array.from`` but don't want to be dependent on Node.)
let limit = chunk.length;
if (this.trailingCR) {
// The previous chunk had a trailing cr. We need to handle it now.
chunk = `\r${chunk}`;
if (this.carriedFromPrevious !== undefined) {
// The previous chunk had char we must carry over.
chunk = `${this.carriedFromPrevious}${chunk}`;
this.carriedFromPrevious = undefined;
}
if (!end && chunk[limit - 1] === CR) {
// The chunk ends with a trailing CR. We cannot know how to handle it
// until we get the next chunk or the end of the stream. So save it for
// later.
let limit = chunk.length;
const lastCode = chunk.charCodeAt(limit - 1);
if (!end &&
// A trailing CR or surrogate must be carried over to the next
// chunk.
(lastCode === CR || (lastCode >= 0xD800 && lastCode <= 0xDBFF))) {
// The chunk ends with a character that must be carried over. We cannot
// know how to handle it until we get the next chunk or the end of the
// stream. So save it for later.
this.carriedFromPrevious = chunk[limit - 1];
limit--;
this.trailingCR = true;
chunk = chunk.slice(0, limit);
}
this.limit = limit;

@@ -578,2 +627,9 @@ this.chunk = chunk;

/** @private */
newline(originalNL) {
this.originalNL = originalNL;
this.line++;
this.positionAtNewLine = this.position;
}
/**

@@ -583,2 +639,4 @@ * Get a single code point out of the current chunk. This updates the current

*
* This is the algorithm to use for XML 1.0.
*
* @private

@@ -588,45 +646,150 @@ *

*/
getCode() {
getCode10() {
const { chunk, i } = this;
this.prevI = i;
// Using charCodeAt and handling the surrogates ourselves is faster
// than using codePointAt.
let code = chunk.charCodeAt(i);
const code = chunk.charCodeAt(i);
let skip = 1;
switch (code) {
case CR:
// We may get NaN if we read past the end of the chunk, which is
// fine.
if (chunk.charCodeAt(i + 1) === NL) {
// A \r\n sequence is converted to \n so we have to skip over the next
// character. We already know it has a size of 1 so ++ is fine here.
skip++;
// Yes, we do this instead of doing this.i++. Doing it this way, we do not
// read this.i again, which is a bit faster.
this.i = i + 1;
if (code < 0xD800) {
if (code >= SPACE || code === TAB) {
return code;
}
// Otherwise, a \r is just converted to \n, so we don't have to skip
// ahead.
// In either case, \r becomes \n.
code = NL;
/* yes, fall through */
case NL:
this.line++;
this.column = 0;
break;
default:
this.column++;
if (code >= 0xD800 && code <= 0xDBFF) {
code = 0x10000 + ((code - 0xD800) * 0x400) +
switch (code) {
case NL:
this.newline(true);
return NL;
case CR:
// We may get NaN if we read past the end of the chunk, which is fine.
if (chunk.charCodeAt(i + 1) === NL) {
// A \r\n sequence is converted to \n so we have to skip over the next
// character. We already know it has a size of 1 so ++ is fine here.
this.i = i + 2;
}
// Otherwise, a \r is just converted to \n, so we don't have to skip
// ahead.
// In either case, \r becomes \n.
this.newline(false);
return NL;
default:
// If we get here, then code < SPACE and it is not NL CR or TAB.
this.fail("disallowed character.");
return code;
}
}
if (code > 0xDBFF) {
// This is a specialized version of isChar10 that takes into account
// that in this context code > 0xDBFF and code <= 0xFFFF. So it does not
// test cases that don't need testing.
if (!(code >= 0xE000 && code <= 0xFFFD)) {
this.fail("disallowed character.");
}
return code;
}
// eslint-disable-next-line no-restricted-globals
if (isNaN(code)) {
return undefined;
}
const final = 0x10000 + ((code - 0xD800) * 0x400) +
(chunk.charCodeAt(i + 1) - 0xDC00);
this.column++;
skip++;
this.i = i + 2;
// This is a specialized version of isChar10 that takes into account that in
// this context necessarily final >= 0x10000.
if (final > 0x10FFFF) {
this.fail("disallowed character.");
}
return final;
}
/**
* Get a single code point out of the current chunk. This updates the current
* position if we do position tracking.
*
* This is the algorithm to use for XML 1.1.
*
* @private
*
* @returns {number} The character read.
*/
getCode11() {
const { chunk, i } = this;
this.prevI = i;
// Using charCodeAt and handling the surrogates ourselves is faster
// than using codePointAt.
const code = chunk.charCodeAt(i);
// Yes, we do this instead of doing this.i++. Doing it this way, we do not
// read this.i again, which is a bit faster.
this.i = i + 1;
if (code < 0xD800) {
if ((code > 0x1F && code < 0x7F) || (code > 0x9F && code !== LS) ||
code === TAB) {
return code;
}
if (!isChar(code)) {
switch (code) {
case NL: // 0xA
this.newline(true);
return NL;
case CR: { // 0xD
// We may get NaN if we read past the end of the chunk, which is
// fine.
const next = chunk.charCodeAt(i + 1);
if (next === NL || next === NEL) {
// A CR NL or CR NEL sequence is converted to NL so we have to skip over
// the next character. We already know it has a size of 1.
this.i = i + 2;
}
// Otherwise, a CR is just converted to NL, no skip.
}
/* yes, fall through */
case NEL: // 0x85
case LS: // Ox2028
this.newline(false);
return NL;
default:
this.fail("disallowed character.");
return code;
}
}
this.i += skip;
if (code > 0xDBFF) {
// This is a specialized version of isCharAndNotRestricted that takes into
// account that in this context code > 0xDBFF and code <= 0xFFFF. So it
// does not test cases that don't need testing.
if (!(code >= 0xE000 && code <= 0xFFFD)) {
this.fail("disallowed character.");
}
return code;
return code;
}
// eslint-disable-next-line no-restricted-globals
if (isNaN(code)) {
return undefined;
}
const final = 0x10000 + ((code - 0xD800) * 0x400) +
(chunk.charCodeAt(i + 1) - 0xDC00);
this.i = i + 2;
// This is a specialized version of isCharAndNotRestricted that takes into
// account that in this context necessarily final >= 0x10000.
if (final > 0x10FFFF) {
this.fail("disallowed character.");
}
return final;
}

@@ -646,2 +809,14 @@

/**
* @private
*/
handleEOL(buffer, chunk, start) {
if (this.originalNL) {
return start;
}
this[buffer] += `${chunk.slice(start, this.prevI)}\n`;
return this.i;
}
/**
* Capture characters into a buffer until encountering one of a set of

@@ -661,16 +836,19 @@ * characters.

captureTo(chars, buffer) {
const { chunk, limit, i: start } = this;
while (this.i < limit) {
let { i: start } = this;
const { chunk } = this;
while (true) {
const c = this.getCode();
if (c === NL) {
start = this.handleEOL(buffer, chunk, start);
}
else if (c === undefined) {
this[buffer] += chunk.slice(start);
return undefined;
}
if (chars.includes(c)) {
// This is faster than adding codepoints one by one.
this[buffer] += chunk.substring(start,
this.i - (c <= 0xFFFF ? 1 : 2));
this[buffer] += chunk.slice(start, this.prevI);
return c;
}
}
// This is faster than adding codepoints one by one.
this[buffer] += chunk.substring(start);
return undefined;
}

@@ -691,16 +869,19 @@

captureToChar(char, buffer) {
const { chunk, limit, i: start } = this;
while (this.i < limit) {
let { i: start } = this;
const { chunk } = this;
while (true) {
const c = this.getCode();
if (c === NL) {
start = this.handleEOL(buffer, chunk, start);
}
else if (c === undefined) {
this[buffer] += chunk.slice(start);
return false;
}
if (c === char) {
// This is faster than adding codepoints one by one.
this[buffer] += chunk.substring(start,
this.i - (c <= 0xFFFF ? 1 : 2));
this[buffer] += chunk.slice(start, this.prevI);
return true;
}
}
// This is faster than adding codepoints one by one.
this[buffer] += chunk.substring(start);
return false;
}

@@ -718,16 +899,16 @@

captureNameChars() {
const { chunk, limit, i: start } = this;
while (this.i < limit) {
const { chunk, i: start } = this;
while (true) {
const c = this.getCode();
if (c === undefined) {
this.name += chunk.slice(start);
return undefined;
}
// NL is not a name char so we don't have to test specifically for it.
if (!isNameChar(c)) {
// This is faster than adding codepoints one by one.
this.name += chunk.substring(start,
this.i - (c <= 0xFFFF ? 1 : 2));
this.name += chunk.slice(start, this.prevI);
return c;
}
}
// This is faster than adding codepoints one by one.
this.name += chunk.substring(start);
return undefined;
}

@@ -747,16 +928,17 @@

captureWhileNameCheck(buffer) {
const { chunk, limit, i: start } = this;
while (this.i < limit) {
const { chunk, i: start } = this;
while (true) {
const c = this.getCode();
if (c === undefined) {
this[buffer] += chunk.slice(start);
return undefined;
}
// NL cannot satisfy this.nameCheck so we don't have to test
// specifically for it.
if (!this.nameCheck(c)) {
// This is faster than adding codepoints one by one.
this[buffer] += chunk.substring(start,
this.i - (c <= 0xFFFF ? 1 : 2));
this[buffer] += chunk.slice(start, this.prevI);
return c;
}
}
// This is faster than adding codepoints one by one.
this[buffer] += chunk.substring(start);
return undefined;
}

@@ -773,11 +955,24 @@

skipSpaces() {
const { limit } = this;
while (this.i < limit) {
while (true) {
const c = this.getCode();
if (!isS(c)) {
if (c === undefined || !isS(c)) {
return c;
}
}
}
return undefined;
/** @private */
setXMLVersion(version) {
if (version === "1.0") {
this.isChar = isChar10;
this.getCode = this.getCode10;
this.pushAttrib =
this.xmlnsOpt ? this.pushAttribNS10 : this.pushAttribPlain;
}
else {
this.isChar = isChar11;
this.getCode = this.getCode11;
this.pushAttrib =
this.xmlnsOpt ? this.pushAttribNS11 : this.pushAttribPlain;
}
}

@@ -797,10 +992,3 @@

this.i++;
this.column++;
}
else if (isS(c)) {
this.i++;
this.column++;
// An XML declaration cannot appear after initial spaces.
this.xmlDeclPossible = false;
}

@@ -812,7 +1000,26 @@ this.state = S_BEGIN_WHITESPACE;

sBeginWhitespace() {
const c = this.skipSpaces();
// This initial loop is a specialized version of skipSpaces. We need to know
// whether we've encountered spaces or not because as soon as we run into a
// space, an XML declaration is no longer possible. Rather than slow down
// skipSpaces even in places where we don't care whether it skipped anything
// or not, we use a specialized loop here.
let c;
let sawSpace = false;
while (true) {
c = this.getCode();
if (c === undefined || !isS(c)) {
break;
}
sawSpace = true;
}
if (sawSpace) {
this.xmlDeclPossible = false;
}
if (c === LESS) {
this.state = S_OPEN_WAKA;
}
else if (c) {
else if (c !== undefined) {
// have to process this as a text node.

@@ -824,3 +1031,3 @@ // weird, but happens.

}
this.text = String.fromCodePoint(c);
this.i = this.prevI;
this.state = S_TEXT;

@@ -864,13 +1071,11 @@ this.xmlDeclPossible = false;

//
const { chunk, limit, i: start } = this;
let { forbiddenState } = this;
let c;
let { i: start, forbiddenState } = this;
const { chunk } = this;
// eslint-disable-next-line no-labels, no-restricted-syntax
scanLoop:
while (this.i < limit) {
const code = this.getCode();
switch (code) {
while (true) {
switch (this.getCode()) {
case LESS:
this.state = S_OPEN_WAKA;
c = code;
this.text += chunk.slice(start, this.prevI);
forbiddenState = FORBIDDEN_START;

@@ -882,3 +1087,3 @@ // eslint-disable-next-line no-labels

this.entityReturnState = S_TEXT;
c = code;
this.text += chunk.slice(start, this.prevI);
forbiddenState = FORBIDDEN_START;

@@ -907,2 +1112,10 @@ // eslint-disable-next-line no-labels

break;
case NL:
start = this.handleEOL("text", chunk, start);
forbiddenState = FORBIDDEN_START;
break;
case undefined:
this.text += chunk.slice(start);
// eslint-disable-next-line no-labels
break scanLoop;
default:

@@ -913,7 +1126,2 @@ forbiddenState = FORBIDDEN_START;

this.forbiddenState = forbiddenState;
// This is faster than adding codepoints one by one.
this.text += chunk.substring(start,
c === undefined ? undefined :
(this.i - (c <= 0xFFFF ? 1 : 2)));
}

@@ -924,16 +1132,11 @@

// This is essentially a specialized version of captureTo which is optimized
// for performing the ]]> check. A previous version of this code, checked
// ``this.text`` for the presence of ]]>. It simplified the code but was
// very costly when character data contained a lot of entities to be parsed.
//
// Since we are using a specialized loop, we also keep track of the presence
// of non-space characters in the text since these are errors when appearing
// outside the document root element.
//
const { chunk, limit, i: start } = this;
// for a specialized task. We keep track of the presence of non-space
// characters in the text since these are errors when appearing outside the
// document root element.
let { i: start } = this;
const { chunk } = this;
let nonSpace = false;
let c;
// eslint-disable-next-line no-labels, no-restricted-syntax
outRootLoop:
while (this.i < limit) {
while (true) {
const code = this.getCode();

@@ -943,3 +1146,3 @@ switch (code) {

this.state = S_OPEN_WAKA;
c = code;
this.text += chunk.slice(start, this.prevI);
// eslint-disable-next-line no-labels

@@ -950,6 +1153,14 @@ break outRootLoop;

this.entityReturnState = S_TEXT;
c = code;
this.text += chunk.slice(start, this.prevI);
nonSpace = true;
// eslint-disable-next-line no-labels
break outRootLoop;
case NL:
start = this.handleEOL("text", chunk, start);
// eslint-disable-next-line no-labels
break;
case undefined:
this.text += chunk.slice(start);
// eslint-disable-next-line no-labels
break outRootLoop;
default:

@@ -962,7 +1173,2 @@ if (!isS(code)) {

// This is faster than adding codepoints one by one.
this.text += chunk.substring(start,
c === undefined ? undefined :
(this.i - (c <= 0xFFFF ? 1 : 2)));
if (!nonSpace) {

@@ -988,2 +1194,6 @@ return;

sOpenWaka() {
// Reminder: a state handler is called with at least one character
// available in the current chunk. So the first call to get code inside of
// a state handler cannot return ``undefined``. That's why we don't test
// for it.
const c = this.getCode();

@@ -993,3 +1203,3 @@ // either a /, ?, !, or text is coming next.

this.state = S_OPEN_TAG;
this.name = String.fromCodePoint(c);
this.i = this.prevI;
this.xmlDeclPossible = false;

@@ -1012,3 +1222,3 @@ }

default:
this.fail("disallowed character in tag name.");
this.fail("disallowed character in tag name");
this.state = S_TEXT;

@@ -1068,3 +1278,3 @@ this.xmlDeclPossible = false;

}
else if (c) {
else if (c !== undefined) {
this.doctype += String.fromCodePoint(c);

@@ -1094,3 +1304,3 @@ if (c === OPEN_BRACKET) {

const c = this.captureTo(DTD_TERMINATOR, "doctype");
if (!c) {
if (c === undefined) {
return;

@@ -1304,3 +1514,3 @@ }

}
else if (c) {
else if (c !== undefined) {
this.fail("disallowed character in processing instruction name.");

@@ -1411,11 +1621,18 @@ this.piTarget += String.fromCodePoint(c);

if (c) {
if (c !== undefined) {
switch (this.xmlDeclName) {
case "version":
if (!/^1\.[0-9]+$/.test(this.xmlDeclValue)) {
case "version": {
this.xmlDeclExpects = ["encoding", "standalone"];
const version = this.xmlDeclValue;
this.xmlDecl.version = version;
// This is the test specified by XML 1.0 but it is fine for XML 1.1.
if (!/^1\.[0-9]+$/.test(version)) {
this.fail("version number must match /^1\\.[0-9]+$/.");
}
this.xmlDeclExpects = ["encoding", "standalone"];
this.xmlDecl.version = this.xmlDeclValue;
// When forceXMLVersion is set, the XML declaration is ignored.
else if (!this.opt.forceXMLVersion) {
this.setXMLVersion(version);
}
break;
}
case "encoding":

@@ -1524,3 +1741,3 @@ if (!/^[A-Za-z][A-Za-z0-9._-]*$/.test(this.xmlDeclValue)) {

const c = this.captureNameChars();
if (!c) {
if (c === undefined) {
return;

@@ -1533,2 +1750,3 @@ }

};
this.name = "";

@@ -1578,7 +1796,7 @@ if (this.xmlnsOpt) {

const c = this.skipSpaces();
if (!c) {
if (c === undefined) {
return;
}
if (isNameStartChar(c)) {
this.name = String.fromCodePoint(c);
this.i = this.prevI;
this.state = S_ATTRIB_NAME;

@@ -1598,3 +1816,3 @@ }

/** @private */
pushAttribNS(name, value) {
pushAttribNS10(name, value) {
const { prefix, local } = this.qname(name);

@@ -1604,2 +1822,5 @@ this.attribList.push({ name, prefix, local, value, uri: undefined });

const trimmed = value.trim();
if (trimmed === "") {
this.fail("invalid attempt to undefine prefix in XML 1.0");
}
this.tag.ns[local] = trimmed;

@@ -1615,2 +1836,17 @@ nsPairCheck(this, local, trimmed);

pushAttribNS11(name, value) {
const { prefix, local } = this.qname(name);
this.attribList.push({ name, prefix, local, value, uri: undefined });
if (prefix === "xmlns") {
const trimmed = value.trim();
this.tag.ns[local] = trimmed;
nsPairCheck(this, local, trimmed);
}
else if (name === "xmlns") {
const trimmed = value.trim();
this.tag.ns[""] = trimmed;
nsPairCheck(this, "", trimmed);
}
}
/** @private */

@@ -1636,3 +1872,3 @@ pushAttribPlain(name, value) {

}
else if (c) {
else if (c !== undefined) {
this.fail("disallowed character in attribute name.");

@@ -1645,3 +1881,3 @@ }

const c = this.skipSpaces();
if (!c) {
if (c === undefined) {
return;

@@ -1662,3 +1898,3 @@ }

else if (isNameStartChar(c)) {
this.name = String.fromCodePoint(c);
this.i = this.prevI;
this.state = S_ATTRIB_NAME;

@@ -1683,3 +1919,3 @@ }

this.state = S_ATTRIB_VALUE_UNQUOTED;
this.text = String.fromCodePoint(c);
this.i = this.prevI;
}

@@ -1693,15 +1929,16 @@ }

const { q } = this;
const { chunk, limit, i: start } = this;
// eslint-disable-next-line no-constant-condition
let { i: start } = this;
const { chunk } = this;
while (true) {
if (this.i >= limit) {
// This is faster than adding codepoints one by one.
this.text += chunk.substring(start);
const code = this.getCode();
if (code === undefined) {
this.text += chunk.slice(start);
return;
}
const code = this.getCode();
if (code === q || code === AMP || code === LESS) {
// This is faster than adding codepoints one by one.
const slice = chunk.substring(start,
this.i - (code <= 0xFFFF ? 1 : 2));
if (code === NL) {
start = this.handleEOL("text", chunk, start);
}
else if (code === q || code === AMP || code === LESS) {
const slice = chunk.slice(start, this.prevI);
switch (code) {

@@ -1742,3 +1979,3 @@ case q:

this.fail("no whitespace between attributes.");
this.name = String.fromCodePoint(c);
this.i = this.prevI;
this.state = S_ATTRIB_NAME;

@@ -1761,3 +1998,3 @@ }

}
else if (c) {
else if (c !== undefined) {
if (this.text.includes("]]>")) {

@@ -1786,3 +2023,3 @@ this.fail("the string \"]]>\" is disallowed in char data.");

}
else if (c) {
else if (c !== undefined) {
this.fail("disallowed character in closing tag.");

@@ -1798,3 +2035,3 @@ }

}
else if (c) {
else if (c !== undefined) {
this.fail("disallowed character in closing tag.");

@@ -1901,2 +2138,3 @@ }

qname(name) {
// This is faster than using name.split(":").
const colon = name.indexOf(":");

@@ -1907,4 +2145,4 @@ if (colon === -1) {

const local = name.substring(colon + 1);
const prefix = name.substring(0, colon);
const local = name.slice(colon + 1);
const prefix = name.slice(0, colon);
if (prefix === "" || local === "" || local.includes(":")) {

@@ -1920,7 +2158,6 @@ this.fail(`malformed name: ${name}.`);

const { tag, attribList } = this;
const { name: tagName, attributes } = tag;
{
// add namespace info to tag
const { prefix, local } = this.qname(tagName);
const { prefix, local } = this.qname(tag.name);
tag.prefix = prefix;

@@ -1946,2 +2183,3 @@ tag.local = local;

const { attributes } = tag;
const seen = new Set();

@@ -1955,3 +2193,3 @@ // Note: do not apply default ns to attributes:

if (prefix === "") {
uri = (name === "xmlns") ? XMLNS_NAMESPACE : "";
uri = name === "xmlns" ? XMLNS_NAMESPACE : "";
eqname = name;

@@ -2114,3 +2352,3 @@ }

// The character reference is required to match the CHAR production.
if (!isChar(num)) {
if (!this.isChar(num)) {
this.fail("malformed character entity.");

@@ -2117,0 +2355,0 @@ return `&${entity};`;

@@ -5,3 +5,3 @@ {

"author": "Louis-Dominique Dubeau <ldd@lddubeau.com>",
"version": "3.1.11",
"version": "4.0.0-rc.1",
"main": "lib/saxes.js",

@@ -30,10 +30,10 @@ "types": "lib/saxes.d.ts",

"devDependencies": {
"@commitlint/cli": "^8.0.0",
"@commitlint/config-angular": "^8.0.0",
"@commitlint/cli": "^8.2.0",
"@commitlint/config-angular": "^8.2.0",
"chai": "^4.2.0",
"conventional-changelog-cli": "^2.0.21",
"eslint": "^5.16.0",
"eslint-config-lddubeau-base": "^3.0.5",
"husky": "^2.5.0",
"mocha": "^6.1.4",
"conventional-changelog-cli": "^2.0.23",
"eslint": "^6.5.1",
"eslint-config-lddubeau-base": "^4.0.2",
"husky": "^3.0.8",
"mocha": "^6.2.1",
"renovate-config-lddubeau": "^1.0.0",

@@ -43,3 +43,3 @@ "xml-conformance-suite": "^1.2.0"

"dependencies": {
"xmlchars": "^2.1.1"
"xmlchars": "^2.2.0"
},

@@ -46,0 +46,0 @@ "husky": {

@@ -19,7 +19,6 @@ # saxes

better compliance with well-formedness constraints cannot use sax as-is.
Saxes aims for conformance with [XML 1.0 fifth
edition](https://www.w3.org/TR/2008/REC-xml-20081126/) and [XML Namespaces 1.0
third edition](http://www.w3.org/TR/2009/REC-xml-names-20091208/).
Consequently, saxes does not support HTML, or pseudo-XML, or bad XML.
Consequently, saxes does not support HTML, or pseudo-XML, or bad XML. Saxes
will report well-formedness errors in all these cases but it won't try to
extract data from malformed documents like sax does.

@@ -49,25 +48,20 @@ * Saxes is much much faster than sax, mostly because of a substantial redesign

## Limitations
## Conformance
This is a non-validating parser so it only verifies whether the document is
well-formed. We do aim to raise errors for all malformed constructs encountered.
Saxes supports:
However, this parser does not parse the contents of DTDs. So malformedness
errors caused by errors in DTDs cannot be reported.
* [XML 1.0 fifth edition](https://www.w3.org/TR/2008/REC-xml-20081126/)
* [XML 1.1 second edition](https://www.w3.org/TR/2006/REC-xml11-20060816/)
* [Namespaces in XML 1.0 (Third Edition)](https://www.w3.org/TR/2009/REC-xml-names-20091208/).
* [Namespaces in XML 1.1 (Second Edition)](https://www.w3.org/TR/2006/REC-xml-names11-20060816/).
Also, the parser continues to parse even upon encountering errors, and does its
best to continue reporting errors. You should heed all errors
reported.
## Limitations
**HOWEVER, ONCE AN ERROR HAS BEEN ENCOUNTERED YOU CANNOT RELY ON THE DATA
PROVIDED THROUGH THE OTHER EVENT HANDLERS.**
This is a non-validating parser so it only verifies whether the document is
well-formed. We do aim to raise errors for all malformed constructs
encountered. However, this parser does not thorougly parse the contents of
DTDs. So most malformedness errors caused by errors in DTDs cannot be reported.
After an error, saxes tries to make sense of your document, but it may interpret
it incorrectly. For instance ``<foo a=bc="d"/>`` is invalid XML. Did you mean to
have ``<foo a="bc=d"/>`` or ``<foo a="b" c="d"/>`` or some other variation?
Saxes takes an honest stab at figuring out your mangled XML. That's as good as
it gets.
## Regarding `<!DOCTYPE` and `<!ENTITY`
## Regarding `<!DOCTYPE`s and `<!ENTITY`s
The parser will handle the basic XML entities in text nodes and attribute

@@ -143,6 +137,28 @@ values: `&amp; &lt; &gt; &apos; &quot;`. It's possible to define additional

* `defaultXMLVersion` - The default version of the XML specification to use if
the document contains no XML declaration. If the document does contain an XML
declaration, then this setting is ignored. Must be `"1.0"` or `"1.1"`. The
default is `"1.0"`.
* `forceXMLVersion` - Boolean. A flag indicating whether to force the XML
version used for parsing to the value of ``defaultXMLVersion``. When this flag
is ``true``, ``defaultXMLVersion`` must be specified. If unspecified, the
default value of this flag is ``false``.
Example: suppose you are parsing a document that has an XML declaration
specifying XML version 1.1.
If you set ``defaultXMLVersion`` to ``"1.0"`` without setting
``forceXMLVersion`` then the XML declaration will override the value of
``defaultXMLVersion`` and the document will be parsed according to XML 1.1.
If you set ``defaultXMLVersion`` to ``"1.0"`` and set ``forceXMLVersion`` to
``true``, then the XML declaration will be ignored and the document will be
parsed according to XML 1.0.
### Methods
`write` - Write bytes onto the stream. You don't have to do this all at
once. You can keep writing as much as you want.
`write` - Write bytes onto the stream. You don't have to pass the whole document
in one `write` call. You can read your source chunk by chunk and call `write`
with each chunk.

@@ -174,2 +190,23 @@ `close` - Close the stream. Once closed, no more data may be written until it is

### Error Handling
The parser continues to parse even upon encountering errors, and does its best
to continue reporting errors. You should heed all errors reported. After an
error, however, saxes may interpret your document incorrectly. For instance
``<foo a=bc="d"/>`` is invalid XML. Did you mean to have ``<foo a="bc=d"/>`` or
``<foo a="b" c="d"/>`` or some other variation? For the sake of continuing to
provide errors, saxes will continue parsing the document, but the structure it
reports may be incorrect. It is only after the errors are fixed in the document
that saxes can provide a reliable interpretation of the document.
That leaves you with two rules of thumb when using saxes:
* Pay attention to the errors that saxes report. The default `onerror` handler
throws, so by default, you cannot miss errors.
* **ONCE AN ERROR HAS BEEN ENCOUNTERED, STOP RELYING ON THE EVENT HANDLERS OTHER
THAN `onerror`.** As explained above, when saxes runs into a well-formedness
problem, it makes a guess in order to continue reporting more errors. The guess
may be wrong.
### Events

@@ -208,2 +245,23 @@

### Performance Tips
* saxes works faster on files that use newlines (``\u000A``) as end of line
markers than files that use other end of line markers (like ``\r`` or
``\r\n``). The XML specification requires that conformant applications behave
as if all characters that are to be treated as end of line characters are
converted to ``\u000A`` prior to parsing. The optimal code path for saxes is a
file in which all end of line characters are already ``\u000A``.
* Don't split Unicode strings you feed to saxes across surrogates. When you
naively split a string in JavaScript, you run the risk of splitting a Unicode
character into two surrogates. e.g. In the following example ``a`` and ``b``
each contain half of a single Unicode character: ``const a = "\u{1F4A9}"[0];
const b = "\u{1F4A9}"[1]`` If you feed such split surrogates to versions of
saxes prior to 4, you'd get errors. Saxes version 4 and over are able to
detect when a chunk of data ends with a surrogate and carry over the surrogate
to the next chunk. However this operation entails slicing and concatenating
strings. If you can feed your data in a way that does not split surrogates,
you should do it. (Obviously, feeding all the data at once with a single write
is fastest.)
## FAQ

@@ -210,0 +268,0 @@

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc