Help module to parse a simple XML buffer and store it as a read-only (mostly)
dictionary-type object (MyXml). This dictionary can hold other dictionaries,
nodes-lists, or leaf nodes. Access to the nodes is by using attributes.
xml = parse("Val")
xml.Foo.Bar == "Val"
True
xml.Foo.Bar
Val
I don't like to use the built in Python DOM parsers for simple XML data, but
this module is good only for simple XML! No name-spaces, CDATA and other fancy
features are supported.
There are three factory functions, "parse", "parse_file" and "parse_object".
Both functions take an optional list of tags names from the beginning of the
XML data, to ignore.
- parse_object takes a complex python object (of dictionaries, sequences and
scalars) and creates MyXml object from it.
It is possible, but not convenient, to construct an XML trees using this module.
Usage Examples:
xml = parse('''
... <?xml bla bla bla>
...
...
... One Two & Three
...
...
...
...
... Bla Bla Bla
...
... No
... Value
...
... ''')
- An XML node is an attribute of the MyXml object
xml.Main.Text
One Two & Three
xml.Main.Text == "One Two & Three"
True
xml.Main.Text.value == "One Two & Three"
True
There is also a way to access a node with "nd_" prefix (so we can access
python reserved words), this will also return EMPY_NODE if the node doesn't
exists.
xml.nd_Main.nd_Text
One Two & Three
- A node can be looked at as a list with one item
xml.Main.Double.Double[0] is xml.Main.Double.Double
True
- Nodes Lists are regular lists
len(xml.Main.List.Item)
3
unicode(xml.Main.List.Item[2])
u'Bla Bla Bla'
- MyXml object is a dictionary
xml["Main"]["Text"] == xml.Main["Text"]
True
xml.Main.get("Text") == xml["Main"].Text
True
- There is also a very simple XPath-like method
xml.xpath("Main/List/Item")[2]
Bla Bla Bla
- Attributes can be accessed with an "at_" prefix
xml.Main.List.Item[1].at_ccc
u'ab+c'
- Access the attributes dictionary with "at_dict"
xml.Main.List.Item[0].at_dict["aaa"]
u'bbb'
- Every value can be looked at as a number and a boolean
xml.Main.BoolNum.boolean
False
- Also attribute can be looked at as booleans or numbers
xml.Main.BoolNum.at_num.number * 2
7.0
xml.xpath("Main/BoolNum").at_bool.boolean
True
- But if the value is not a number or boolean (yes, no, true, false, 1, 0) the
- return value is None
xml.Main.List.Item[0].at_aaa.number
- "get" and "xpath" return an empty node by default, so we can still use the
- number/boolean attributes.
bool(xml.get("foo").boolean)
False
xml.xpath("Main/foo").number is None
True
- Printing MyXml objects keeps the original order and adds indentation.
- The indentation is not thread safe though.
print xml.Main.List
Bla Bla Bla
- Constructing MyXml object from a python complex object:
xml = parse_object({
... "foo1": "bar",
... "foo2": ["bar1", "bar2", "bar3"],
... "foo3": {"bar": "foo"},
... "foo4": 5
... }, "Main") # "Main" is the name of the top most node
xml.xpath("Main/foo4").number
5
- The names of the nodes that hold a sequence items, are the type name of the
- sequence (list, tuple, set, generator).
xml.xpath("Main/foo2/list")[1] == "bar2"
True
- Finally - not very useful - but you can modify MyXml object
add_returns_self = xml.add(MyNode("bar5", "foo5")) # MyNode(value, name)
xml.foo5.at_dict["attr"] = "attr value"
xml.xpath("Main/foo5").at_attr == "attr value"
True
One can also use the other built in dictionary and list methods, but this is not
recommended
xml # Here the order is not preserved because of the python dictionary
5
bar
bar1
bar2
bar3
foo
bar5
Please note that this module is not efficient in parsing large XML buffers. It
uses string slicing heavily.
Erez Bibi
Please send comments and questions to
erezbibi AT users DOT sourceforge DOT net