
Security News
Deno 2.6 + Socket: Supply Chain Defense In Your CLI
Deno 2.6 introduces deno audit with a new --socket flag that plugs directly into Socket to bring supply chain security checks into the Deno CLI.
pygixml
Advanced tools
A high-performance XML parser for Python based on Cython and pugixml, providing fast XML parsing, manipulation, XPath queries, text extraction, and advanced XML processing capabilities.
pygixml delivers exceptional performance compared to other XML libraries:
| Library | Parsing Time | Speedup vs ElementTree |
|---|---|---|
| pygixml | 0.00077s | 15.9x faster |
| lxml | 0.00407s | 3.0x faster |
| ElementTree | 0.01220s | 1.0x (baseline) |
pip install pygixml
pip install git+https://github.com/MohammadRaziei/pygixml.git
//book, /library/book, book[1]book[@id], book[@category='fiction']and, or, not()=, !=, <, >, <=, >=+, -, *, div, modposition(), last(), count(), sum(), string(), number()child::, attribute::, descendant::, ancestor::*, @*, node()parse_string(xml_string) - Parse XML from stringparse_file(file_path) - Parse XML from filesave_file(file_path) - Save XML to fileappend_child(name) - Add child nodefirst_child() - Get first child nodechild(name) - Get child by namereset() - Clear documentname - Get/set node namevalue - Get/set node value (for text nodes only)child_value(name) - Get text content of child nodeappend_child(name) - Add child nodefirst_child() - Get first childchild(name) - Get child by namenext_sibling - Get next siblingprevious_sibling - Get previous siblingparent - Get parent nodetext(recursive, join) - Get text contentto_string(indent) - Serialize to XML stringxml - XML representation propertyxpath - Absolute XPath of nodeis_null() - Check if node is nullmem_id - Memory identifier for debuggingselect_nodes(query) - Select multiple nodes using XPathselect_node(query) - Select single node using XPathXPathQuery(query) - Create reusable XPath query objectevaluate_node_set(context) - Evaluate query and return node setevaluate_node(context) - Evaluate query and return first nodeevaluate_boolean(context) - Evaluate query and return booleanevaluate_number(context) - Evaluate query and return numberevaluate_string(context) - Evaluate query and return stringimport pygixml
# Parse XML from string
xml_string = """
<library>
<book id="1">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
</book>
</library>
"""
doc = pygixml.parse_string(xml_string)
root = doc.first_child()
# Access elements
book = root.first_child()
title = book.child("title")
print(f"Title: {title.child_value()}") # Output: Title: The Great Gatsby
# Create new XML
doc = pygixml.XMLDocument()
root = doc.append_child("catalog")
product = root.append_child("product")
product.name = "product"
# To add text content to an element, append a text node
text_node = product.append_child("") # Empty name creates text node
text_node.value = "content"
import pygixml
xml_string = """
<root>
<simple>Hello World</simple>
<nested>
<child>Child Text</child>
More text
</nested>
<mixed>Text <b>with</b> mixed <i>content</i></mixed>
</root>
"""
doc = pygixml.parse_string(xml_string)
root = doc.first_child()
# Get direct text content
simple = root.child("simple")
print(simple.child_value()) # "Hello World"
# Get recursive text content
nested = root.child("nested")
print(nested.text(recursive=True)) # "Child Text\nMore text"
# Get direct text only (non-recursive)
mixed = root.child("mixed")
print(mixed.text(recursive=False)) # "Text "
# Custom join character
print(nested.text(recursive=True, join=" | ")) # "Child Text | More text"
import pygixml
doc = pygixml.XMLDocument()
root = doc.append_child("root")
child = root.append_child("item")
child.name = "product"
# Serialize to string
print(root.to_string()) # <root>\n <product/>\n</root>
print(root.to_string(" ")) # Custom indentation
# Convenience property
print(root.xml) # Same as to_string() with default indent
import pygixml
xml_string = """
<root>
<item>First</item>
<item>Second</item>
<item>Third</item>
</root>
"""
doc = pygixml.parse_string(xml_string)
# Iterate over document (depth-first)
for node in doc:
print(f"Node: {node.name}, XPath: {node.xpath}")
# Iterate over children
root = doc.first_child()
for child in root:
print(f"Child: {child.name}, Value: {child.child_value()}")
import pygixml
doc = pygixml.parse_string("<root><a/><b/></root>")
root = doc.first_child()
a = root.child("a")
b = root.child("b")
a2 = root.child("a")
print(a == a2) # True - same node
print(a == b) # False - different nodes
print(a.mem_id) # Memory address for debugging
pygixml provides full XPath 1.0 support through pugixml's powerful XPath engine:
import pygixml
xml_string = """
<library>
<book id="1" category="fiction">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
<price>12.99</price>
</book>
<book id="2" category="fiction">
<title>1984</title>
<author>George Orwell</author>
<year>1949</year>
<price>10.99</price>
</book>
</library>
"""
doc = pygixml.parse_string(xml_string)
root = doc.first_child()
# Select all books
books = root.select_nodes("book")
print(f"Found {len(books)} books")
# Select fiction books
fiction_books = root.select_nodes("book[@category='fiction']")
print(f"Found {len(fiction_books)} fiction books")
# Select specific book by ID
book_2 = root.select_node("book[@id='2']")
if book_2:
title = book_2.node.child("title").child_value()
print(f"Book ID 2: {title}")
# Use XPathQuery for repeated queries
query = pygixml.XPathQuery("book[year > 1930]")
recent_books = query.evaluate_node_set(root)
print(f"Found {len(recent_books)} books published after 1930")
# XPath boolean evaluation
has_orwell = pygixml.XPathQuery("book[author='George Orwell']").evaluate_boolean(root)
print(f"Has George Orwell books: {has_orwell}")
# XPath number evaluation
avg_price = pygixml.XPathQuery("sum(book/price) div count(book)").evaluate_number(root)
print(f"Average price: ${avg_price:.2f}")
In pugixml (and therefore pygixml), element nodes do not have values directly. Instead, they contain child text nodes that hold the text content.
# ❌ This will NOT work (element nodes don't have values):
element_node.value = "some text"
# âś… Correct approach - use child_value() to get text content:
text_content = element_node.child_value()
# âś… To set text content, you need to append a text node:
text_node = element_node.append_child("") # Empty name creates text node
text_node.value = "some text"
Run performance comparisons:
# Run complete benchmark suite
python benchmarks/clean_visualization.py
# View results
cat benchmarks/results/benchmark_results.csv
The benchmark suite compares pygixml against:
Benchmark Files:
benchmarks/clean_visualization.py - Main benchmark runnerbenchmarks/benchmark_parsing.py - Core benchmark logicbenchmarks/results/ - Generated CSV data and SVG chartsđź“– Full documentation is available at: https://mohammadraziei.github.io/pygixml/
The documentation includes:
MIT License - see LICENSE file for details.
To use this library, you must star the project on GitHub!
This helps support the development and shows appreciation for the work. Please star the repository before using the library:
FAQs
Python wrapper for pugixml using Cython - Please star the project on GitHub to use!
We found that pygixml demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Deno 2.6 introduces deno audit with a new --socket flag that plugs directly into Socket to bring supply chain security checks into the Deno CLI.

Security News
New DoS and source code exposure bugs in React Server Components and Next.js: what’s affected and how to update safely.

Security News
Socket CEO Feross Aboukhadijeh joins Software Engineering Daily to discuss modern software supply chain attacks and rising AI-driven security risks.