6.0.1 (2023-06-05)
Bug Fixes
- "X" is not a valid hex prefix for char references (465038b)
- add fragment and additionalNamespaces to SaxesOption typing (02d8275)
- add namespace checks (9f94c4b)
- always run in strict mode (ed8b0b1)
- CDATA end in attributes must not cause an error (a7495ac)
- check that the characters we read are valid char data (7611a85)
- correct typo (97bc5da)
- detect unclosed tags in fragments (5642f36)
- disallow BOM characters at the beginning of subsequent chunks (66d07b6)
- disallow spaces after open waka (da7f76d)
- don't serialize the fileName as undefined: when not present (4ff2365)
- drop the lowercase option (987d4bf)
- emit CDATA on empty CDATA section too (95d192f)
- emit empty comment (b3db392)
- entities are always strict (0f6a30e)
- fail on colon at start of QName (507addd)
- fix a bug in EOL handling (bed38a8)
- fix bug with initial eol characters (7b3db75)
- fix corrupted attribute values when there is no text handler (e135f11), closes #38
- fix some typing mistakes (f2a1d5e)
- fixing linting errors for eslint 8 (cd4b5c9)
- generate an error on prefix with empty local name (89a3b86), closes #5
- handle column computation over characters in the astral plane (cefc8f7)
- handling of end of line characters (f13247a)
- harmonize error messages and initialize flags (9a20cad)
- implement attribute normalization (be51114), closes #24
- just one error for text before the root, and text after (101ea50)
- more namespace checks (a1add21)
- move eslint to devDependencies (d747538)
- move namespace checks to their proper place (4a1c99f)
- normalize \r\n and \r followed by something else to \n (d7b1abe), closes #2
- npm audit warning (a6c9ba8)
- only accept uppercase CDATA to mark the start of CDATA (e86534d)
- pay attention to comments and processing instructions in DTDs (52ffd90), closes #19
- prevent colons in pi and entity names when xmlns is true (4327eec)
- prevent empty entities (04e1593)
- raise an error if the document does not have a root (f2de520)
- raise an error on ]]> in character data (2964381)
- raise an error on < in attribute values (4fd67a1)
- raise an error on multiple root elements (45047ae)
- raise error on CDATA before or after root (604241f)
- raise error on character reference outside CHAR production (30fb540)
- remove broken or pointless examples (1a5b642)
- report an error on duplicate attributes (ee4e340)
- report an error on whitespace at the start of end tag (c13b122)
- report processing instructions that do not have a target (c007e39)
- resolve is now part of the public API (bb4bed5)
- treat ?? in processing instructions correctly (bc1e1d4)
- trim URIs (78cc6f3)
- typings: "selfClosing" => "isSelfClosing" (d96a2bd)
- use
isNameChar
for later chars in PI target (83d2b61) - use the latest xmlchars (b30a714)
- use xmlchars for checking names (2c939fe)
- verify that character references match the CHAR production (369afde)
- we don't support node 10 anymore (f2aa1a8)
Code Refactoring
- adjust the names used for processing instructions (3b508e9)
- convert code to ES6 (fe81170)
- drop attribute event (c7c2e80)
- drop buffer size checks (9ce2f7a)
- drop normalize (9c6d84c)
- drop opencdata and on closecdata (3287d2c)
- drop SGML declaration parsing (4aaf2d9)
- drop the
parser
function, rename SAXParser (0878a6c) - drop trim (c03c7d0)
- pass the actual tag to onclosetag (7020e64)
- provide default no-op implementation for events (a94687f)
- remove the API based on Stream (ebb659a)
- simplify namespace processing (2d4ce0f)
Features
- add forceXMLVersion (1eedbf8)
- add makeError method (50fa39a)
- add support for parsing fragments (1ff2d6a)
- add the
resolvePrefix
option (90301fb) - add xmldecl event (a2e677f)
- drop the resume() method; and have onerror() throw (ac601e5)
- formal method for setting event listeners (f346150)
- handle XML declarations (5258939)
- process the xmlns attribute the customary way (2c9672a)
- reinstating the attribute events (7c80f7b)
- revamped error messages (cf9c589)
- saxes handles chunks that "break" unicode (1272448)
- saxes is now implemented in TS (664ba69)
- stronger check on bad cdata closure (d416760)
- support for XML 1.1 (36704fb)
- the flush method returns its parser (68c2020)
Performance Improvements
- add emitNodes to skip checking text buffer more than needed (9d5e357)
- add topNS for faster namespace processing (1a33a57)
- capture names in the
name
field (c7dffd5) - check the most common case first (40a34d5)
- concatenate openWakaBang just once (07345bf)
- don't check twice if this.textNode is set (00536cc)
- don't depend on limit to know when we hit the end of buffer (ad4ab53)
- don't increment a column number (490fc24)
- don't repeatedly read this.i in the getCode methods (d3f196c)
- drop the originalNL flag in favor of a NL_LIKE fake character (f690725)
- dump isNaN; it is very costly (7d97e1a)
- eliminate extra buffers (3412fcb)
- improve performance of text handling (9c13099)
- improve some more the speed of ]]> detection (a0216cd)
- improve text node checking speed (f270e8b)
- improve the check for ]]> in character data (21df9b5)
- inline closeText (07a3b51)
- introduce a specialized version of captureWhile (04855d6)
- introduce captureTo and captureToChar (76eb95a)
- make the most common path of getCode functions the shortest (4d66bbb)
- minimine concatenation by adding the capability to unget codes (27fa8b9)
- minor optimizations (c7e36bf)
- move more common/valid cases first (a65586e)
- reduce the frequency at which we clear attribValue (1570615)
- reduce the number of calls to closeText (3e68df5)
- remove an unnecessary variable (ac03a1c)
- remove handler check (fbe35ff)
- remove more extra buffers (b5ee774)
- remove skipWhitespace (c8b7ae2)
- remove some redundant buffer resets (5ded326)
- simplify captureWhile (bb2085c)
- simplify the skip functions (c7b8c3b)
- split sText into two specialized loops (732325e)
- the c field has been unused for a while: remove it (9ca0246)
- use -1 to mean EOC (end-of-chunk) (55c0b1b)
- use charCodeAt and handle surrogates ourselves (b8ec232)
- use isCharAndNotRestricted rather than call two functions (f0b67a4)
- use slice rather than substring (c1fed89)
- use specialized code for sAttribValueQuoted (6c484f3)
- use strings for the general states (3869908)
BREAKING CHANGES
- we don't support node 10.
- The individually named event handlers no longer exist. You now
must use the methods
on
and off
to set handlers. Upcoming features require
that saxes know when handlers are added and removed, and it may be necessary in
the future to qualify how to add or remove a handler. Getters/setters are too
restrictives so we bite the bullet now and move to actual methods. - The fix to column number reporting changes the meaning of the
column
field. If you need the old behavior of column
you can use the new
columnIndex
field which behaves like the old column
and may be useful in
some contexts. Ultimately you should decide whether your application needs to
know column numbers by Unicode character count or by JavaScript index. (And you
need to know the difference between the two. You can see this
page for a detailed
discussion of the Unicode problem in JavaScript. Note that the numbers put in
the error messages that fail
produce are still based on the column
field
and thus use the new meaning of column
. If you want error message that use
columnIndex
you may override the fail
method. - previous versions of saxes did not consistently convert end of
line characters to NL (0xA) in the data reported by event handlers. This has
been fixed. If your code relied on the old (incorrect) behavior then you'll have
to update it.
- previous versions of saxes would parse files with an XML
declaration set to 1.1 as 1.0 documents. The support for 1.1 entails that if a
document has an XML declaration that specifies version 1.1 it is parsed as a 1.1
document.
- when
fileName
is undefined in the parser options saxes does
not show a file name in error messages. Previously it was showing the name
undefined
. To get the previous behavior, in all cases where you'd leave
fileName
undefined, you must set it to the string "undefined"
instead. - In previous versions the attribute
xmlns
(as in <foo xmlns="some-uri">
would
be reported as having the prefix "xmlns"
and the local name ""
. This
behavior was inherited from sax. There was some logic to it, but this behavior
was surprising to users of the library. The principle of least surprise favors
eliminating that surprising behavior in favor of something less surprising.
This commit makes it so that xmlns
is not reported as having a prefix of ""
and a local name of "xmlns"
. This accords with how people interpret attribute
names like foo
, bar
, moo
which all have no prefix and a local name.
Code that deals with namespace bindings or cares about xmlns
probably needs to
be changed.
-
Sax was only passing the tag name. We pass the whole object.
-
- The
ns
field is no longer using the prototype trick that sax used. The
ns
field of a tag contains only those namespaces that the tag declares.
-
We no longer have opennamespace
and closenamespace
events. The
information they provide can be obtained by examining the tags passed to tag
events.
-
attribute
is not a particularly useful event for parsing XML. The only thing
it adds over looking at attributes on tag objects is that you get the order of
the attributes from the source, but attribute order in XML is irrelevant.
-
The opencdata and closecdata events became redundant once we removed the buffer
size limitations. So we remove these events.
-
The parser
function is removed. Just create a new instance with
new
.
SAXParser
is now SaxesParser.
So new require("saxes").SaxesParser(...)
.
- The API based on Stream is gone. There were multiple issues with it. It was
Node-specific. It used an ancient Node API (the so-called "classic
streams"). Its behavior was idiosyncratic.
- Sax had no default error handler but if you wanted to continue calling
write()
after an error you had to call resume()
. We do away with
resume()
and instead install a default onerror
which throws. Replace
with a no-op handler if you want to continue after errors. - The "processinginstruction" now produces a "target" field instead of a "name"
field. The nomenclature "target" is the one used in the XML literature.
- By default parsers now have a default no-op implementation for each
event it supports. This would break code that determines whether a
custom handler was added by checking whether there's any handler at
all. This removes the necessity for the parser implementation to check
whether there is a handler before calling it.
In the process of making this change, we've removed support for the
on...
properties on streams objects. Their existence was not
warranted by any standard API provided by Node. (EventEmitter
does
not have on...
properties for events it supports, nor does
Stream
.) Their existence was also undocumented. And their
functioning was awkward. For instance, with sax, this:
const s = sax.createStream();
const handler = () => console.log("moo");
s.on("cdata", handler);
console.log(s.oncdata === handler);
would print false
. If you examine s.oncdata
you see it is glue
code instead of the handler assigned. This is just bizarre, so we
removed it.
- SGML declaration is not supported by XML. This is an XML parser. So we
remove support for SGML declarations. They now cause errors.
- We removed support for the code that checked buffer sizes and would
raise errors if a buffer was close to an arbitrary limit or emitted
multiple
text
or cdata
events in order avoid passing strings
greater than an arbitrary size. So MAX_BUFFER_LENGTH
is gone.
The feature always seemed a bit awkward. Client code could limit the
size of buffers to 1024K, for instance, and not get a text
event
with a text payload greater than 1024K... so far so good but if the
same document contained a comment with more than 1024K that would
result in an error. Hmm.... why? The distinction seems entirely
arbitrary.
The upshot is that client code needs to be ready to handle strings of
any length supported by the platform.
If there's a clear need to reintroduce it, we'll reassess.
- It is no longer possible to load the library as-is through a
script
element. It needs building.
The library now assumes a modern runtime. It no longer contains any
code to polyfill what's missing. It is up to developers using this
code to deal with polyfills as needed.
- We drop the
trim
option. It is up to client code to trip text if
it needs it. - We no longer support the
normalize
option. It is up to client code
to perform whatever normalization it wants. - The
lowercase
option makes no sense for XML. It is removed. - Remove support for strictEntities. Entities are now always strict, as
required by the XML specification.
- The API no longer takes a
strict
argument anywhere. This also
effectively removes support for HTML processing, or allow processing
without errors anything which is less than full XML. It also removes
special processing of script
elements.