saxes - npm Package Compare versions

saxes

Package Overview

Dependencies

Maintainers

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

Comparing version 3.1.11 to 4.0.0-rc.1

CHANGELOG.md

		@@ -0,1 +1,48 @@
		<a name="4.0.0-rc.1"></a>
		# [4.0.0-rc.1](https://github.com/lddubeau/saxes/compare/v3.1.11...v4.0.0-rc.1) (2019-10-02)


		### Bug Fixes

		* don't serialize the fileName as undefined: when not present ([4ff2365](https://github.com/lddubeau/saxes/commit/4ff2365))
		* fix bug with initial eol characters ([7b3db75](https://github.com/lddubeau/saxes/commit/7b3db75))
		* handling of end of line characters ([f13247a](https://github.com/lddubeau/saxes/commit/f13247a))


		### Features

		* add forceXMLVersion ([1eedbf8](https://github.com/lddubeau/saxes/commit/1eedbf8))
		* saxes handles chunks that "break" unicode ([1272448](https://github.com/lddubeau/saxes/commit/1272448))
		* support for XML 1.1 ([36704fb](https://github.com/lddubeau/saxes/commit/36704fb))


		### Performance Improvements

		* don't depend on limit to know when we hit the end of buffer ([ad4ab53](https://github.com/lddubeau/saxes/commit/ad4ab53))
		* don't increment a column number ([490fc24](https://github.com/lddubeau/saxes/commit/490fc24))
		* don't repeatedly read this.i in the getCode methods ([d3f196c](https://github.com/lddubeau/saxes/commit/d3f196c))
		* improve performance of text handling ([9c13099](https://github.com/lddubeau/saxes/commit/9c13099))
		* make the most common path of getCode functions the shortest ([4d66bbb](https://github.com/lddubeau/saxes/commit/4d66bbb))
		* minimine concatenation by adding the capability to unget codes ([27fa8b9](https://github.com/lddubeau/saxes/commit/27fa8b9))
		* use isCharAndNotRestricted rather than call two functions ([f0b67a4](https://github.com/lddubeau/saxes/commit/f0b67a4))
		* use slice rather than substring ([c1fed89](https://github.com/lddubeau/saxes/commit/c1fed89))


		### BREAKING CHANGES

		* previous versions of saxes did not consistently convert end of
		line characters to NL (0xA) in the data reported by event handlers. This has
		been fixed. If your code relied on the old (incorrect) behavior then you'll have
		to update it.
		* previous versions of saxes would parse files with an XML
		declaration set to 1.1 as 1.0 documents. The support for 1.1 entails that if a
		document has an XML declaration that specifies version 1.1 it is parsed as a 1.1
		document.
		* when ``fileName`` is undefined in the parser options saxes does
		not show a file name in error messages. Previously it was showing the name
		``undefined``. To get the previous behavior, in all cases where you'd leave
		``fileName`` undefined, you must set it to the string ``"undefined"`` instead.



		<a name="3.1.11"></a>
		@@ -2,0 +49,0 @@ ## [3.1.11](https://github.com/lddubeau/saxes/compare/v3.1.10...v3.1.11) (2019-06-25)

lib/saxes.d.ts

		declare namespace saxes {
		export const EVENTS: ReadonlyArray<string>;

		export interface SaxesOptions {
		export interface CommonSaxesOptions {
		xmlns?: boolean;
		@@ -10,4 +10,16 @@ position?: boolean;
		additionalNamespaces?: Record<string, string>;
		defaultXMLVersion?: "1.0" \| "1.1";
		}

		export interface NotForced extends CommonSaxesOptions {
		forceXMLVersion?: false;
		}

		export interface Forced extends CommonSaxesOptions {
		defaultXMLVersion: CommonSaxesOptions["defaultXMLVersion"];
		forceXMLVersion: true;
		}

		export type SaxesOptions = NotForced \| Forced;

		export interface XMLDecl {
		@@ -14,0 +26,0 @@ version?: string;

590

lib/saxes.js

		"use strict";

		const { isS, isChar, isNameStartChar, isNameChar, S_LIST, NAME_RE } =
		require("xmlchars/xml/1.0/ed5");
		const { isNCNameStartChar, isNCNameChar, NC_NAME_RE } = require("xmlchars/xmlns/1.0/ed3");
		const {
		isS, isChar: isChar10, isNameStartChar, isNameChar, S_LIST, NAME_RE,
		} = require("xmlchars/xml/1.0/ed5");
		const { isChar: isChar11 } = require("xmlchars/xml/1.1/ed2");
		const { isNCNameStartChar, isNCNameChar, NC_NAME_RE } =
		require("xmlchars/xmlns/1.0/ed3");

		@@ -88,2 +91,3 @@ const XML_NAMESPACE = "http://www.w3.org/XML/1998/namespace";

		const TAB = 9;
		const NL = 0xA;
		@@ -105,2 +109,4 @@ const CR = 0xD;
		const CLOSE_BRACKET = 0x5D;
		const NEL = 0x85;
		const LS = 0x2028; // Line Separator

		@@ -261,6 +267,14 @@ function isQuote(c) {
		*
		* @property {string} [fileName] A file name to use for error reporting. Leaving
		* this unset will report a file name of "undefined". "File name" is a loose
		* concept. You could use a URL to some resource, or any descriptive name you
		* like.
		* @property {string} [fileName] A file name to use for error reporting. "File
		* name" is a loose concept. You could use a URL to some resource, or any
		* descriptive name you like.
		*
		* @property {"1.0" \| "1.1"} [defaultXMLVersion] The default XML version to
		* use. If unspecified, and there is no XML encoding declaration, the default
		* version is "1.0".
		*
		* @property {boolean} [forceXMLVersion] A flag indicating whether to force the
		* XML version used for parsing to the value of ``defaultXMLVersion``. When this
		* flag is ``true``, ``defaultXMLVersion`` must be specified. If unspecified,
		* the default value of this flag is ``false``.
		*/
		@@ -326,3 +340,14 @@
		this.i = 0;
		this.trailingCR = false;
		//
		// We use prevI to allow "ungetting" the previously read code point. Note
		// however, that it is not safe to unget everything and anything. In
		// particular ungetting EOL characters will screw positioning up.
		//
		// Practically, you must not unget a code which has any side effect beyond
		// updating ``this.i`` and ``this.prevI``. Only EOL codes have such side
		// effects.
		//
		this.prevI = 0;
		this.carriedFromPrevious = undefined;
		this.originalNL = true;
		this.forbiddenState = FORBIDDEN_START;
		@@ -368,5 +393,4 @@ /**
		this.processAttribs = this.processAttribsNS;
		this.pushAttrib = this.pushAttribNS;

		this.ns = Object.assign({ __proto__: null }, rootNS);
		this.ns = { __proto__: null, ...rootNS };
		const additional = this.opt.additionalNamespaces;
		@@ -383,5 +407,14 @@ if (additional) {
		this.processAttribs = this.processAttribsPlain;
		this.pushAttrib = this.pushAttribPlain;
		}

		let { defaultXMLVersion } = this.opt;
		const { forceXMLVersion } = this.opt;
		if (defaultXMLVersion === undefined) {
		if (forceXMLVersion) {
		throw new Error("forceXMLVersion set but defaultXMLVersion is not set");
		}
		defaultXMLVersion = "1.0";
		}
		this.setXMLVersion(defaultXMLVersion);

		this.trackPosition = this.opt.position !== false;
		@@ -392,3 +425,3 @@ /** The line number the parser is currently looking at. */
		/** The column the parser is currently looking at. */
		this.column = 0;
		this.positionAtNewLine = 0;

		@@ -404,2 +437,6 @@ this.fileName = this.opt.fileName;

		get column() {
		return this.position - this.positionAtNewLine;
		}

		/* eslint-disable class-methods-use-this */
		@@ -499,3 +536,3 @@ /**
		*
		* @param {Error} er The error to report.
		* @param {string} er The error to report.
		*
		@@ -505,5 +542,13 @@ * @returns this
		fail(er) {
		const message = (this.trackPosition) ?
		`${this.fileName}:${this.line}:${this.column}: ${er}` : er;

		let message = this.fileName \|\| "";
		if (this.trackPosition) {
		if (message.length > 0) {
		message += ":";
		}
		message += `${this.line}:${this.column}`;
		}
		if (message.length > 0) {
		message += ": ";
		}
		message += er;
		this.onerror(new Error(message));
		@@ -537,21 +582,25 @@ return this;
		// of single complete characters (``Array.from(chunk)``) would be faster
		// than the current repeated calls to ``codePointAt``. As of August 2018, it
		// than the current repeated calls to ``charCodeAt``. As of August 2018, it
		// isn't. (There may be Node-specific code that would perform faster than
		// ``Array.from`` but don't want to be dependent on Node.)

		let limit = chunk.length;

		if (this.trailingCR) {
		// The previous chunk had a trailing cr. We need to handle it now.
		chunk = `\r${chunk}`;
		if (this.carriedFromPrevious !== undefined) {
		// The previous chunk had char we must carry over.
		chunk = `${this.carriedFromPrevious}${chunk}`;
		this.carriedFromPrevious = undefined;
		}

		if (!end && chunk[limit - 1] === CR) {
		// The chunk ends with a trailing CR. We cannot know how to handle it
		// until we get the next chunk or the end of the stream. So save it for
		// later.
		let limit = chunk.length;
		const lastCode = chunk.charCodeAt(limit - 1);
		if (!end &&
		// A trailing CR or surrogate must be carried over to the next
		// chunk.
		(lastCode === CR \|\| (lastCode >= 0xD800 && lastCode <= 0xDBFF))) {
		// The chunk ends with a character that must be carried over. We cannot
		// know how to handle it until we get the next chunk or the end of the
		// stream. So save it for later.
		this.carriedFromPrevious = chunk[limit - 1];
		limit--;
		this.trailingCR = true;
		chunk = chunk.slice(0, limit);
		}
		this.limit = limit;

		@@ -578,2 +627,9 @@ this.chunk = chunk;

		/** @private */
		newline(originalNL) {
		this.originalNL = originalNL;
		this.line++;
		this.positionAtNewLine = this.position;
		}

		/**
		@@ -583,2 +639,4 @@ * Get a single code point out of the current chunk. This updates the current
		*
		* This is the algorithm to use for XML 1.0.
		*
		* @private
		@@ -588,45 +646,150 @@ *
		*/
		getCode() {
		getCode10() {
		const { chunk, i } = this;
		this.prevI = i;
		// Using charCodeAt and handling the surrogates ourselves is faster
		// than using codePointAt.
		let code = chunk.charCodeAt(i);
		const code = chunk.charCodeAt(i);

		let skip = 1;
		switch (code) {
		case CR:
		// We may get NaN if we read past the end of the chunk, which is
		// fine.
		if (chunk.charCodeAt(i + 1) === NL) {
		// A \r\n sequence is converted to \n so we have to skip over the next
		// character. We already know it has a size of 1 so ++ is fine here.
		skip++;
		// Yes, we do this instead of doing this.i++. Doing it this way, we do not
		// read this.i again, which is a bit faster.
		this.i = i + 1;
		if (code < 0xD800) {
		if (code >= SPACE \|\| code === TAB) {
		return code;
		}
		// Otherwise, a \r is just converted to \n, so we don't have to skip
		// ahead.

		// In either case, \r becomes \n.
		code = NL;
		/* yes, fall through */
		case NL:
		this.line++;
		this.column = 0;
		break;
		default:
		this.column++;
		if (code >= 0xD800 && code <= 0xDBFF) {
		code = 0x10000 + ((code - 0xD800) * 0x400) +
		switch (code) {
		case NL:
		this.newline(true);
		return NL;
		case CR:
		// We may get NaN if we read past the end of the chunk, which is fine.
		if (chunk.charCodeAt(i + 1) === NL) {
		// A \r\n sequence is converted to \n so we have to skip over the next
		// character. We already know it has a size of 1 so ++ is fine here.
		this.i = i + 2;
		}
		// Otherwise, a \r is just converted to \n, so we don't have to skip
		// ahead.

		// In either case, \r becomes \n.
		this.newline(false);
		return NL;
		default:
		// If we get here, then code < SPACE and it is not NL CR or TAB.
		this.fail("disallowed character.");
		return code;
		}
		}

		if (code > 0xDBFF) {
		// This is a specialized version of isChar10 that takes into account
		// that in this context code > 0xDBFF and code <= 0xFFFF. So it does not
		// test cases that don't need testing.
		if (!(code >= 0xE000 && code <= 0xFFFD)) {
		this.fail("disallowed character.");
		}

		return code;
		}

		// eslint-disable-next-line no-restricted-globals
		if (isNaN(code)) {
		return undefined;
		}

		const final = 0x10000 + ((code - 0xD800) * 0x400) +
		(chunk.charCodeAt(i + 1) - 0xDC00);
		this.column++;
		skip++;
		this.i = i + 2;

		// This is a specialized version of isChar10 that takes into account that in
		// this context necessarily final >= 0x10000.
		if (final > 0x10FFFF) {
		this.fail("disallowed character.");
		}

		return final;
		}


		/**
		* Get a single code point out of the current chunk. This updates the current
		* position if we do position tracking.
		*
		* This is the algorithm to use for XML 1.1.
		*
		* @private
		*
		* @returns {number} The character read.
		*/
		getCode11() {
		const { chunk, i } = this;
		this.prevI = i;
		// Using charCodeAt and handling the surrogates ourselves is faster
		// than using codePointAt.
		const code = chunk.charCodeAt(i);

		// Yes, we do this instead of doing this.i++. Doing it this way, we do not
		// read this.i again, which is a bit faster.
		this.i = i + 1;
		if (code < 0xD800) {
		if ((code > 0x1F && code < 0x7F) \|\| (code > 0x9F && code !== LS) \|\|
		code === TAB) {
		return code;
		}

		if (!isChar(code)) {
		switch (code) {
		case NL: // 0xA
		this.newline(true);
		return NL;
		case CR: { // 0xD
		// We may get NaN if we read past the end of the chunk, which is
		// fine.
		const next = chunk.charCodeAt(i + 1);
		if (next === NL \|\| next === NEL) {
		// A CR NL or CR NEL sequence is converted to NL so we have to skip over
		// the next character. We already know it has a size of 1.
		this.i = i + 2;
		}
		// Otherwise, a CR is just converted to NL, no skip.
		}
		/* yes, fall through */
		case NEL: // 0x85
		case LS: // Ox2028
		this.newline(false);
		return NL;
		default:
		this.fail("disallowed character.");
		return code;
		}
		}

		this.i += skip;
		if (code > 0xDBFF) {
		// This is a specialized version of isCharAndNotRestricted that takes into
		// account that in this context code > 0xDBFF and code <= 0xFFFF. So it
		// does not test cases that don't need testing.
		if (!(code >= 0xE000 && code <= 0xFFFD)) {
		this.fail("disallowed character.");
		}

		return code;
		return code;
		}

		// eslint-disable-next-line no-restricted-globals
		if (isNaN(code)) {
		return undefined;
		}

		const final = 0x10000 + ((code - 0xD800) * 0x400) +
		(chunk.charCodeAt(i + 1) - 0xDC00);
		this.i = i + 2;

		// This is a specialized version of isCharAndNotRestricted that takes into
		// account that in this context necessarily final >= 0x10000.
		if (final > 0x10FFFF) {
		this.fail("disallowed character.");
		}

		return final;
		}
		@@ -646,2 +809,14 @@
		/**
		* @private
		*/
		handleEOL(buffer, chunk, start) {
		if (this.originalNL) {
		return start;
		}

		this[buffer] += `${chunk.slice(start, this.prevI)}\n`;
		return this.i;
		}

		/**
		* Capture characters into a buffer until encountering one of a set of
		@@ -661,16 +836,19 @@ * characters.
		captureTo(chars, buffer) {
		const { chunk, limit, i: start } = this;
		while (this.i < limit) {
		let { i: start } = this;
		const { chunk } = this;
		while (true) {
		const c = this.getCode();
		if (c === NL) {
		start = this.handleEOL(buffer, chunk, start);
		}
		else if (c === undefined) {
		this[buffer] += chunk.slice(start);
		return undefined;
		}

		if (chars.includes(c)) {
		// This is faster than adding codepoints one by one.
		this[buffer] += chunk.substring(start,
		this.i - (c <= 0xFFFF ? 1 : 2));
		this[buffer] += chunk.slice(start, this.prevI);
		return c;
		}
		}

		// This is faster than adding codepoints one by one.
		this[buffer] += chunk.substring(start);
		return undefined;
		}
		@@ -691,16 +869,19 @@
		captureToChar(char, buffer) {
		const { chunk, limit, i: start } = this;
		while (this.i < limit) {
		let { i: start } = this;
		const { chunk } = this;
		while (true) {
		const c = this.getCode();
		if (c === NL) {
		start = this.handleEOL(buffer, chunk, start);
		}
		else if (c === undefined) {
		this[buffer] += chunk.slice(start);
		return false;
		}

		if (c === char) {
		// This is faster than adding codepoints one by one.
		this[buffer] += chunk.substring(start,
		this.i - (c <= 0xFFFF ? 1 : 2));
		this[buffer] += chunk.slice(start, this.prevI);
		return true;
		}
		}

		// This is faster than adding codepoints one by one.
		this[buffer] += chunk.substring(start);
		return false;
		}
		@@ -718,16 +899,16 @@
		captureNameChars() {
		const { chunk, limit, i: start } = this;
		while (this.i < limit) {
		const { chunk, i: start } = this;
		while (true) {
		const c = this.getCode();
		if (c === undefined) {
		this.name += chunk.slice(start);
		return undefined;
		}

		// NL is not a name char so we don't have to test specifically for it.
		if (!isNameChar(c)) {
		// This is faster than adding codepoints one by one.
		this.name += chunk.substring(start,
		this.i - (c <= 0xFFFF ? 1 : 2));
		this.name += chunk.slice(start, this.prevI);
		return c;
		}
		}

		// This is faster than adding codepoints one by one.
		this.name += chunk.substring(start);
		return undefined;
		}
		@@ -747,16 +928,17 @@
		captureWhileNameCheck(buffer) {
		const { chunk, limit, i: start } = this;
		while (this.i < limit) {
		const { chunk, i: start } = this;
		while (true) {
		const c = this.getCode();
		if (c === undefined) {
		this[buffer] += chunk.slice(start);
		return undefined;
		}

		// NL cannot satisfy this.nameCheck so we don't have to test
		// specifically for it.
		if (!this.nameCheck(c)) {
		// This is faster than adding codepoints one by one.
		this[buffer] += chunk.substring(start,
		this.i - (c <= 0xFFFF ? 1 : 2));
		this[buffer] += chunk.slice(start, this.prevI);
		return c;
		}
		}

		// This is faster than adding codepoints one by one.
		this[buffer] += chunk.substring(start);
		return undefined;
		}
		@@ -773,11 +955,24 @@
		skipSpaces() {
		const { limit } = this;
		while (this.i < limit) {
		while (true) {
		const c = this.getCode();
		if (!isS(c)) {
		if (c === undefined \|\| !isS(c)) {
		return c;
		}
		}
		}

		return undefined;
		/** @private */
		setXMLVersion(version) {
		if (version === "1.0") {
		this.isChar = isChar10;
		this.getCode = this.getCode10;
		this.pushAttrib =
		this.xmlnsOpt ? this.pushAttribNS10 : this.pushAttribPlain;
		}
		else {
		this.isChar = isChar11;
		this.getCode = this.getCode11;
		this.pushAttrib =
		this.xmlnsOpt ? this.pushAttribNS11 : this.pushAttribPlain;
		}
		}
		@@ -797,10 +992,3 @@
		this.i++;
		this.column++;
		}
		else if (isS(c)) {
		this.i++;
		this.column++;
		// An XML declaration cannot appear after initial spaces.
		this.xmlDeclPossible = false;
		}

		@@ -812,7 +1000,26 @@ this.state = S_BEGIN_WHITESPACE;
		sBeginWhitespace() {
		const c = this.skipSpaces();
		// This initial loop is a specialized version of skipSpaces. We need to know
		// whether we've encountered spaces or not because as soon as we run into a
		// space, an XML declaration is no longer possible. Rather than slow down
		// skipSpaces even in places where we don't care whether it skipped anything
		// or not, we use a specialized loop here.
		let c;
		let sawSpace = false;
		while (true) {
		c = this.getCode();
		if (c === undefined \|\| !isS(c)) {
		break;
		}

		sawSpace = true;
		}

		if (sawSpace) {
		this.xmlDeclPossible = false;
		}

		if (c === LESS) {
		this.state = S_OPEN_WAKA;
		}
		else if (c) {
		else if (c !== undefined) {
		// have to process this as a text node.
		@@ -824,3 +1031,3 @@ // weird, but happens.
		}
		this.text = String.fromCodePoint(c);
		this.i = this.prevI;
		this.state = S_TEXT;
		@@ -864,13 +1071,11 @@ this.xmlDeclPossible = false;
		//
		const { chunk, limit, i: start } = this;
		let { forbiddenState } = this;
		let c;
		let { i: start, forbiddenState } = this;
		const { chunk } = this;
		// eslint-disable-next-line no-labels, no-restricted-syntax
		scanLoop:
		while (this.i < limit) {
		const code = this.getCode();
		switch (code) {
		while (true) {
		switch (this.getCode()) {
		case LESS:
		this.state = S_OPEN_WAKA;
		c = code;
		this.text += chunk.slice(start, this.prevI);
		forbiddenState = FORBIDDEN_START;
		@@ -882,3 +1087,3 @@ // eslint-disable-next-line no-labels
		this.entityReturnState = S_TEXT;
		c = code;
		this.text += chunk.slice(start, this.prevI);
		forbiddenState = FORBIDDEN_START;
		@@ -907,2 +1112,10 @@ // eslint-disable-next-line no-labels
		break;
		case NL:
		start = this.handleEOL("text", chunk, start);
		forbiddenState = FORBIDDEN_START;
		break;
		case undefined:
		this.text += chunk.slice(start);
		// eslint-disable-next-line no-labels
		break scanLoop;
		default:
		@@ -913,7 +1126,2 @@ forbiddenState = FORBIDDEN_START;
		this.forbiddenState = forbiddenState;

		// This is faster than adding codepoints one by one.
		this.text += chunk.substring(start,
		c === undefined ? undefined :
		(this.i - (c <= 0xFFFF ? 1 : 2)));
		}
		@@ -924,16 +1132,11 @@
		// This is essentially a specialized version of captureTo which is optimized
		// for performing the ]]> check. A previous version of this code, checked
		// ``this.text`` for the presence of ]]>. It simplified the code but was
		// very costly when character data contained a lot of entities to be parsed.
		//
		// Since we are using a specialized loop, we also keep track of the presence
		// of non-space characters in the text since these are errors when appearing
		// outside the document root element.
		//
		const { chunk, limit, i: start } = this;
		// for a specialized task. We keep track of the presence of non-space
		// characters in the text since these are errors when appearing outside the
		// document root element.
		let { i: start } = this;
		const { chunk } = this;
		let nonSpace = false;
		let c;
		// eslint-disable-next-line no-labels, no-restricted-syntax
		outRootLoop:
		while (this.i < limit) {
		while (true) {
		const code = this.getCode();
		@@ -943,3 +1146,3 @@ switch (code) {
		this.state = S_OPEN_WAKA;
		c = code;
		this.text += chunk.slice(start, this.prevI);
		// eslint-disable-next-line no-labels
		@@ -950,6 +1153,14 @@ break outRootLoop;
		this.entityReturnState = S_TEXT;
		c = code;
		this.text += chunk.slice(start, this.prevI);
		nonSpace = true;
		// eslint-disable-next-line no-labels
		break outRootLoop;
		case NL:
		start = this.handleEOL("text", chunk, start);
		// eslint-disable-next-line no-labels
		break;
		case undefined:
		this.text += chunk.slice(start);
		// eslint-disable-next-line no-labels
		break outRootLoop;
		default:
		@@ -962,7 +1173,2 @@ if (!isS(code)) {

		// This is faster than adding codepoints one by one.
		this.text += chunk.substring(start,
		c === undefined ? undefined :
		(this.i - (c <= 0xFFFF ? 1 : 2)));

		if (!nonSpace) {
		@@ -988,2 +1194,6 @@ return;
		sOpenWaka() {
		// Reminder: a state handler is called with at least one character
		// available in the current chunk. So the first call to get code inside of
		// a state handler cannot return ``undefined``. That's why we don't test
		// for it.
		const c = this.getCode();
		@@ -993,3 +1203,3 @@ // either a /, ?, !, or text is coming next.
		this.state = S_OPEN_TAG;
		this.name = String.fromCodePoint(c);
		this.i = this.prevI;
		this.xmlDeclPossible = false;
		@@ -1012,3 +1222,3 @@ }
		default:
		this.fail("disallowed character in tag name.");
		this.fail("disallowed character in tag name");
		this.state = S_TEXT;
		@@ -1068,3 +1278,3 @@ this.xmlDeclPossible = false;
		}
		else if (c) {
		else if (c !== undefined) {
		this.doctype += String.fromCodePoint(c);
		@@ -1094,3 +1304,3 @@ if (c === OPEN_BRACKET) {
		const c = this.captureTo(DTD_TERMINATOR, "doctype");
		if (!c) {
		if (c === undefined) {
		return;
		@@ -1304,3 +1514,3 @@ }
		}
		else if (c) {
		else if (c !== undefined) {
		this.fail("disallowed character in processing instruction name.");
		@@ -1411,11 +1621,18 @@ this.piTarget += String.fromCodePoint(c);

		if (c) {
		if (c !== undefined) {
		switch (this.xmlDeclName) {
		case "version":
		if (!/^1\.[0-9]+$/.test(this.xmlDeclValue)) {
		case "version": {
		this.xmlDeclExpects = ["encoding", "standalone"];
		const version = this.xmlDeclValue;
		this.xmlDecl.version = version;
		// This is the test specified by XML 1.0 but it is fine for XML 1.1.
		if (!/^1\.[0-9]+$/.test(version)) {
		this.fail("version number must match /^1\\.[0-9]+$/.");
		}
		this.xmlDeclExpects = ["encoding", "standalone"];
		this.xmlDecl.version = this.xmlDeclValue;
		// When forceXMLVersion is set, the XML declaration is ignored.
		else if (!this.opt.forceXMLVersion) {
		this.setXMLVersion(version);
		}
		break;
		}
		case "encoding":
		@@ -1524,3 +1741,3 @@ if (!/^[A-Za-z][A-Za-z0-9._-]*$/.test(this.xmlDeclValue)) {
		const c = this.captureNameChars();
		if (!c) {
		if (c === undefined) {
		return;
		@@ -1533,2 +1750,3 @@ }
		};
		this.name = "";

		@@ -1578,7 +1796,7 @@ if (this.xmlnsOpt) {
		const c = this.skipSpaces();
		if (!c) {
		if (c === undefined) {
		return;
		}
		if (isNameStartChar(c)) {
		this.name = String.fromCodePoint(c);
		this.i = this.prevI;
		this.state = S_ATTRIB_NAME;
		@@ -1598,3 +1816,3 @@ }
		/** @private */
		pushAttribNS(name, value) {
		pushAttribNS10(name, value) {
		const { prefix, local } = this.qname(name);
		@@ -1604,2 +1822,5 @@ this.attribList.push({ name, prefix, local, value, uri: undefined });
		const trimmed = value.trim();
		if (trimmed === "") {
		this.fail("invalid attempt to undefine prefix in XML 1.0");
		}
		this.tag.ns[local] = trimmed;
		@@ -1615,2 +1836,17 @@ nsPairCheck(this, local, trimmed);

		pushAttribNS11(name, value) {
		const { prefix, local } = this.qname(name);
		this.attribList.push({ name, prefix, local, value, uri: undefined });
		if (prefix === "xmlns") {
		const trimmed = value.trim();
		this.tag.ns[local] = trimmed;
		nsPairCheck(this, local, trimmed);
		}
		else if (name === "xmlns") {
		const trimmed = value.trim();
		this.tag.ns[""] = trimmed;
		nsPairCheck(this, "", trimmed);
		}
		}

		/** @private */
		@@ -1636,3 +1872,3 @@ pushAttribPlain(name, value) {
		}
		else if (c) {
		else if (c !== undefined) {
		this.fail("disallowed character in attribute name.");
		@@ -1645,3 +1881,3 @@ }
		const c = this.skipSpaces();
		if (!c) {
		if (c === undefined) {
		return;
		@@ -1662,3 +1898,3 @@ }
		else if (isNameStartChar(c)) {
		this.name = String.fromCodePoint(c);
		this.i = this.prevI;
		this.state = S_ATTRIB_NAME;
		@@ -1683,3 +1919,3 @@ }
		this.state = S_ATTRIB_VALUE_UNQUOTED;
		this.text = String.fromCodePoint(c);
		this.i = this.prevI;
		}
		@@ -1693,15 +1929,16 @@ }
		const { q } = this;
		const { chunk, limit, i: start } = this;
		// eslint-disable-next-line no-constant-condition
		let { i: start } = this;
		const { chunk } = this;
		while (true) {
		if (this.i >= limit) {
		// This is faster than adding codepoints one by one.
		this.text += chunk.substring(start);
		const code = this.getCode();
		if (code === undefined) {
		this.text += chunk.slice(start);
		return;
		}
		const code = this.getCode();
		if (code === q \|\| code === AMP \|\| code === LESS) {
		// This is faster than adding codepoints one by one.
		const slice = chunk.substring(start,
		this.i - (code <= 0xFFFF ? 1 : 2));

		if (code === NL) {
		start = this.handleEOL("text", chunk, start);
		}
		else if (code === q \|\| code === AMP \|\| code === LESS) {
		const slice = chunk.slice(start, this.prevI);
		switch (code) {
		@@ -1742,3 +1979,3 @@ case q:
		this.fail("no whitespace between attributes.");
		this.name = String.fromCodePoint(c);
		this.i = this.prevI;
		this.state = S_ATTRIB_NAME;
		@@ -1761,3 +1998,3 @@ }
		}
		else if (c) {
		else if (c !== undefined) {
		if (this.text.includes("]]>")) {
		@@ -1786,3 +2023,3 @@ this.fail("the string \"]]>\" is disallowed in char data.");
		}
		else if (c) {
		else if (c !== undefined) {
		this.fail("disallowed character in closing tag.");
		@@ -1798,3 +2035,3 @@ }
		}
		else if (c) {
		else if (c !== undefined) {
		this.fail("disallowed character in closing tag.");
		@@ -1901,2 +2138,3 @@ }
		qname(name) {
		// This is faster than using name.split(":").
		const colon = name.indexOf(":");
		@@ -1907,4 +2145,4 @@ if (colon === -1) {

		const local = name.substring(colon + 1);
		const prefix = name.substring(0, colon);
		const local = name.slice(colon + 1);
		const prefix = name.slice(0, colon);
		if (prefix === "" \|\| local === "" \|\| local.includes(":")) {
		@@ -1920,7 +2158,6 @@ this.fail(`malformed name: ${name}.`);
		const { tag, attribList } = this;
		const { name: tagName, attributes } = tag;

		{
		// add namespace info to tag
		const { prefix, local } = this.qname(tagName);
		const { prefix, local } = this.qname(tag.name);
		tag.prefix = prefix;
		@@ -1946,2 +2183,3 @@ tag.local = local;

		const { attributes } = tag;
		const seen = new Set();
		@@ -1955,3 +2193,3 @@ // Note: do not apply default ns to attributes:
		if (prefix === "") {
		uri = (name === "xmlns") ? XMLNS_NAMESPACE : "";
		uri = name === "xmlns" ? XMLNS_NAMESPACE : "";
		eqname = name;
		@@ -2114,3 +2352,3 @@ }
		// The character reference is required to match the CHAR production.
		if (!isChar(num)) {
		if (!this.isChar(num)) {
		this.fail("malformed character entity.");
		@@ -2117,0 +2355,0 @@ return `&${entity};`;

package.json

		@@ -5,3 +5,3 @@ {
		"author": "Louis-Dominique Dubeau <ldd@lddubeau.com>",
		"version": "3.1.11",
		"version": "4.0.0-rc.1",
		"main": "lib/saxes.js",
		@@ -30,10 +30,10 @@ "types": "lib/saxes.d.ts",
		"devDependencies": {
		"@commitlint/cli": "^8.0.0",
		"@commitlint/config-angular": "^8.0.0",
		"@commitlint/cli": "^8.2.0",
		"@commitlint/config-angular": "^8.2.0",
		"chai": "^4.2.0",
		"conventional-changelog-cli": "^2.0.21",
		"eslint": "^5.16.0",
		"eslint-config-lddubeau-base": "^3.0.5",
		"husky": "^2.5.0",
		"mocha": "^6.1.4",
		"conventional-changelog-cli": "^2.0.23",
		"eslint": "^6.5.1",
		"eslint-config-lddubeau-base": "^4.0.2",
		"husky": "^3.0.8",
		"mocha": "^6.2.1",
		"renovate-config-lddubeau": "^1.0.0",
		@@ -43,3 +43,3 @@ "xml-conformance-suite": "^1.2.0"
		"dependencies": {
		"xmlchars": "^2.1.1"
		"xmlchars": "^2.2.0"
		},
		@@ -46,0 +46,0 @@ "husky": {

104

README.md

		@@ -19,7 +19,6 @@ # saxes
		better compliance with well-formedness constraints cannot use sax as-is.
		Saxes aims for conformance with [XML 1.0 fifth
		edition](https://www.w3.org/TR/2008/REC-xml-20081126/) and [XML Namespaces 1.0
		third edition](http://www.w3.org/TR/2009/REC-xml-names-20091208/).

		Consequently, saxes does not support HTML, or pseudo-XML, or bad XML.
		Consequently, saxes does not support HTML, or pseudo-XML, or bad XML. Saxes
		will report well-formedness errors in all these cases but it won't try to
		extract data from malformed documents like sax does.

		@@ -49,25 +48,20 @@ * Saxes is much much faster than sax, mostly because of a substantial redesign

		## Limitations
		## Conformance

		This is a non-validating parser so it only verifies whether the document is
		well-formed. We do aim to raise errors for all malformed constructs encountered.
		Saxes supports:

		However, this parser does not parse the contents of DTDs. So malformedness
		errors caused by errors in DTDs cannot be reported.
		* [XML 1.0 fifth edition](https://www.w3.org/TR/2008/REC-xml-20081126/)
		* [XML 1.1 second edition](https://www.w3.org/TR/2006/REC-xml11-20060816/)
		* [Namespaces in XML 1.0 (Third Edition)](https://www.w3.org/TR/2009/REC-xml-names-20091208/).
		* [Namespaces in XML 1.1 (Second Edition)](https://www.w3.org/TR/2006/REC-xml-names11-20060816/).

		Also, the parser continues to parse even upon encountering errors, and does its
		best to continue reporting errors. You should heed all errors
		reported.
		## Limitations

		**HOWEVER, ONCE AN ERROR HAS BEEN ENCOUNTERED YOU CANNOT RELY ON THE DATA
		PROVIDED THROUGH THE OTHER EVENT HANDLERS.**
		This is a non-validating parser so it only verifies whether the document is
		well-formed. We do aim to raise errors for all malformed constructs
		encountered. However, this parser does not thorougly parse the contents of
		DTDs. So most malformedness errors caused by errors in DTDs cannot be reported.

		After an error, saxes tries to make sense of your document, but it may interpret
		it incorrectly. For instance ``<foo a=bc="d"/>`` is invalid XML. Did you mean to
		have ``<foo a="bc=d"/>`` or ``<foo a="b" c="d"/>`` or some other variation?
		Saxes takes an honest stab at figuring out your mangled XML. That's as good as
		it gets.
		## Regarding `<!DOCTYPE` and `<!ENTITY`

		## Regarding `<!DOCTYPE`s and `<!ENTITY`s

		The parser will handle the basic XML entities in text nodes and attribute
		@@ -143,6 +137,28 @@ values: `& < > ' "`. It's possible to define additional

		* `defaultXMLVersion` - The default version of the XML specification to use if
		the document contains no XML declaration. If the document does contain an XML
		declaration, then this setting is ignored. Must be `"1.0"` or `"1.1"`. The
		default is `"1.0"`.

		* `forceXMLVersion` - Boolean. A flag indicating whether to force the XML
		version used for parsing to the value of ``defaultXMLVersion``. When this flag
		is ``true``, ``defaultXMLVersion`` must be specified. If unspecified, the
		default value of this flag is ``false``.

		Example: suppose you are parsing a document that has an XML declaration
		specifying XML version 1.1.

		If you set ``defaultXMLVersion`` to ``"1.0"`` without setting
		``forceXMLVersion`` then the XML declaration will override the value of
		``defaultXMLVersion`` and the document will be parsed according to XML 1.1.

		If you set ``defaultXMLVersion`` to ``"1.0"`` and set ``forceXMLVersion`` to
		``true``, then the XML declaration will be ignored and the document will be
		parsed according to XML 1.0.

		### Methods

		`write` - Write bytes onto the stream. You don't have to do this all at
		once. You can keep writing as much as you want.
		`write` - Write bytes onto the stream. You don't have to pass the whole document
		in one `write` call. You can read your source chunk by chunk and call `write`
		with each chunk.

		@@ -174,2 +190,23 @@ `close` - Close the stream. Once closed, no more data may be written until it is

		### Error Handling

		The parser continues to parse even upon encountering errors, and does its best
		to continue reporting errors. You should heed all errors reported. After an
		error, however, saxes may interpret your document incorrectly. For instance
		``<foo a=bc="d"/>`` is invalid XML. Did you mean to have ``<foo a="bc=d"/>`` or
		``<foo a="b" c="d"/>`` or some other variation? For the sake of continuing to
		provide errors, saxes will continue parsing the document, but the structure it
		reports may be incorrect. It is only after the errors are fixed in the document
		that saxes can provide a reliable interpretation of the document.

		That leaves you with two rules of thumb when using saxes:

		* Pay attention to the errors that saxes report. The default `onerror` handler
		throws, so by default, you cannot miss errors.

		* **ONCE AN ERROR HAS BEEN ENCOUNTERED, STOP RELYING ON THE EVENT HANDLERS OTHER
		THAN `onerror`.** As explained above, when saxes runs into a well-formedness
		problem, it makes a guess in order to continue reporting more errors. The guess
		may be wrong.

		### Events
		@@ -208,2 +245,23 @@

		### Performance Tips

		* saxes works faster on files that use newlines (``\u000A``) as end of line
		markers than files that use other end of line markers (like ``\r`` or
		``\r\n``). The XML specification requires that conformant applications behave
		as if all characters that are to be treated as end of line characters are
		converted to ``\u000A`` prior to parsing. The optimal code path for saxes is a
		file in which all end of line characters are already ``\u000A``.

		* Don't split Unicode strings you feed to saxes across surrogates. When you
		naively split a string in JavaScript, you run the risk of splitting a Unicode
		character into two surrogates. e.g. In the following example ``a`` and ``b``
		each contain half of a single Unicode character: ``const a = "\u{1F4A9}"[0];
		const b = "\u{1F4A9}"[1]`` If you feed such split surrogates to versions of
		saxes prior to 4, you'd get errors. Saxes version 4 and over are able to
		detect when a chunk of data ends with a surrogate and carry over the surrogate
		to the next chunk. However this operation entails slicing and concatenating
		strings. If you can feed your data in a way that does not split surrogates,
		you should do it. (Obviously, feeding all the data at once with a single write
		is fastest.)

		## FAQ
		@@ -210,0 +268,0 @@

New alerts

Improved metrics

Worsened metrics

Dependency changes