xml stream parser
xml-stream-parser is xml parser for GO. It is efficient to parse large xml data with streaming fashion.
Usage
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book isbn="AAA">
<title>20 Love poems and a song of dispair</title>
<price>12.95</price>
<comments>
</comments>
</book>
<book isbn="XXX">
<title>The Iliad and The Odyssey</title>
<price>12.95</price>
<comments>
<userComment rating="4">Best translation I've read.</userComment>
<userComment rating="2">I like other versions better.</userComment>
</comments>
</book>
<book isbn="YYY">
<title>Anthology of World Literature</title>
<price>24.95</price>
<comments>
<userComment rating="3">Needs more modern literature.</userComment>
<userComment rating="4">Excellent overview of world literature.</userComment>
</comments>
</book>
</bookstore>
Stream over books
f, _ := os.Open("input.xml")
br := bufio.NewReaderSize(f,65536)
parser := xmlparser.NewXMLParser(br, "book")
for xml := range parser.Stream() {
fmt.Println(xml.Childs["title"][0].InnerText)
fmt.Println(xml.Childs["comments"][0].Childs["userComment"][0].Attrs["rating"])
fmt.Println(xml.Childs["comments"][0].Childs["userComment"][0].InnerText)
}
Skip tags for speed
parser := xmlparser.NewXMLParser(br, "book").SkipElements([]string{"price", "comments"})
Error handlings
for xml := range parser.Stream() {
if xml.Err !=nil {
}
}
Progress of parsing
parser.TotalReadSize
Using GetValue function from a XMLElement instance:
value = xml.GetValue("comments.userComment")
value = xml.GetValue("comments[1].userComment[1]")
if you would want to get the InnerText from a node:
value = node.GetValue(".")
and never do GetValue("")
. To get an attribute value:
attValue = xml.GetValue("comments[1].userComment@rating")
attValue = xml.GetValue("comments.userComment[1]@rating")
attValue = xml.GetValue("@isbn")
If the value of the nodes are numeric we can use functions which allow get them converted to the numeric values:
intValue = xml.GetValueInt("comments[1].userComment@rating")
intValue = xml.GetValueInt("comments.userComment[1]@rating")
float64Value = xml.GetValueF64("price")
These values will be 0 if the value of the node is not numeric.
There is an option to get the value of a node from a path diving in each node of that path until arrive to the goal, for example doing:
comment = bookstore.GetValueDeep("book.comments.userComment")
it will return Best translation I've read.
because is the first instance which has the complete path. GetValueIntDeep
and GetValueF64Deep
are too valid.
Using GetNodes and GetNode function from a XMLElement instance:
singleNode = xml.GetNode("comments.userComment")
singleNode = xml.GetNode("comments[1].userComment[1]")
and
nodeArray = xml.GetNodes("comments.userComment")
nodeArray = xml.GetNodes("comments[1].userComment")
Using GetAllNodes function from a XMLElement instance:
This function allows to get all nodes following all the tree of the passed xpath and recovering all its leafs, for example
f, _ := os.Open("input.xml")
br := bufio.NewReaderSize(f,65536)
parser := xmlparser.NewXMLParser(br, "bookstore")
// notice we are getting the root node, so the next for will loop only 1 time
for xml := range parser.Stream() {
nodes := xml.GetAllNodes("book.comment.userComment")
}
this invokation will return an array of 4 elements, 2 of each comments node.
If you interested check also json parser which works similarly