= xamplr-pp-gem
== INTRODUCTION
This is the Ruby version of the xampl pull parser, xamplr-pp. The
class name is xampl_PP.
Both pull parsers and SAX parsers are stream based -- they parse
the XML file bit by bit informing its client of interesting events
as they are encountered. The whole XML document is not required to
be in memory. The significant difference between pull parsers and
SAX parsers is in where the 'main loop' is located: in the client
for pull parsers, in the parser for SAX parsers. Clients call a
method of the pull parser to get the next event. SAX parsers call
methods of the client to notify it of events (so these are 'push
parsers'). You can pass the pull parser around as an argument.
As a way of illustrating the use of xamplr-pp a reasonably usable
SAX-like parser (SAXish) is implemented as well. There is a saxdemo.rb
file provided that provides statistics on the parsed XML file. It
can be run using 'make sax' (look in the Makefile to see how it is
used)
== STATUS
xamplr-pp has been in daily use in a SaaS content management system since about 2004. It is an integral component of xamplr. Unfortunately documentation is sparse.
xamplr-pp has passed all unit tests using Ruby 1.6.7, 1.8.6, 1.8.7 on Linux and OS X. It has not even been tried on Ruby 1.9.1 at this point.
The unit tests require Lapidary (remember that?). Run these by typing 'make ut' on
the command line in the installation directory.
xamplr-pp is not a validating parser, in fact, it
doesn't enforce some wellformedness rules. DOCTYPE definitions are passed
intact to the client program. No attempt is made
to extract any kind of entity definitions from the DOCTYPE. Clever
use of the DOCTYPE event and the resolver method can relieve this
problem.
I have some serious reservations about the way that xamplr-pp handles
multi-byte characters, but it definitely works well with UTF8.
Some attention has been paid to performance. It is reasonably quick, certainly on the fast side for pure ruby parsers.
== DOCUMENTATION
There isn't much. Fortunately, the API to xamplr-pp is very small
and quite easy to understand.
It turns out to be quite easy to build a SAX-like parser from a
push parser. It also turns out that doing this pretty much uses
the entire xamplr-pp api. So I've implemented a SAX-like parser
(SAXish) and annotated it. This is pretty much all the documentation
in this release.
SAXish is, in itself, a pretty usable SAX-like parser (of course
the limits to validation and wellformedness checking mentioned above
apply).
== CONFORMANCE
Well, this is an interesting question. Right now, it doesn't do too
well, but it does this in a 'good' way, or at least a manageable
way.
On the Oasis conformance tests xamplr-pp 435 tests pass that should
pass, 188 fail that should fail, 3 failed that should pass, and
1188 passed that should fail. Generally speaking, xamplr-pp is 'permissive'.
However to do this I had to cheat. I defined a resolver that
returns a string rather than nil, this allows xamplr-pp to pretend
that it is dealing with entities defined in the DOCTYPE. In other words, for you to achieve this level of conformance you'll need to supply an entity resolver.
The three that failed that should have passed all involve apparently
legal DOCTYPE declarations that xamplr-pp does not recognise as
legal. It is unlikely that I do anything about these.
The 1188 that passed that should not appear to be due to the
non-enforcement of well-formedness rules. I have not looked at them
all, I assure you of that, but it seems that this most involve
entity definitions. Well, every one that I looked at did, can't say for what I didn't look at.
== LICENCE
xamplr-pp is licensed under the LGPL (see LICENSE)
== CONTACT INFORMATION
My email is hutch@recursive.ca, feel free to contact me there, or
you can use the facilities provided by Sourceforge.
== Copyright
Copyright (c) 2009 Bob Hutchison. See LICENSE for details.