
Research
2025 Report: Destructive Malware in Open Source Packages
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.
xpath-kit
Advanced tools
xpath-kit is a powerful Python library that provides a fluent, object-oriented, and Pythonic interface for building and executing XPath queries on top of lxml. It transforms complex, error-prone XPath string composition into a highly readable and maintainable chain of objects and methods.
Say goodbye to messy, hard-to-read XPath strings:
div[@id="main" and contains(@class, "content")]/ul/li[position()=1]
And say hello to a more intuitive and IDE-friendly way of writing queries:
E.div[(A.id == "main") & A.class_.contains("content")] / E.ul / E.li[1]
/, //, [], &, |, ==, >) to build complex XPath expressions naturally using familiar Python logic.E (elements), A (attributes), and F (functions) for a highly readable syntax with excellent IDE autocompletion support.any(), all(), and none().lxml elements, allowing for easy DOM traversal and manipulation (e.g., append, remove, parent, next_sibling).html() and xml() entry points.Install xpath-kit from PyPI using pip:
pip install xpath-kit
The library requires lxml as a dependency, which will be installed automatically.
Here's a simple example of how to use xpath-kit to parse a piece of HTML and extract information.
from xpathkit import html, E, A, F
html_content = """
<html>
<body>
<div id="main">
<h2>Article Title</h2>
<p>This is the first paragraph.</p>
<ul class="item-list">
<li class="item active">Item 1</li>
<li class="item">Item 2</li>
<li class="item disabled">Item 3</li>
</ul>
</div>
</body>
</html>
"""
# 1. Parse the HTML content
root = html(html_content)
# 2. Build a query to find the <li> element with both "item" and "active" classes
# XPath: .//ul[contains(@class, "item-list")]/li[contains(@class, "item") and contains(@class, "active")]
query = E.ul[A.class_.contains("item-list")] / E.li[A.class_.all("item", "active")]
# 3. Execute the query and get a single element
active_item = root.descendant(query)
# Print its content and attributes
print(f"Tag: {active_item.tag}")
print(f"Text: {active_item.string()}")
print(f"Class attribute: {active_item['class']}")
# --- Output ---
# Tag: li
# Text: Item 1
# Class attribute: item active
# 4. Build a more complex query: find all <li> elements whose class does NOT contain 'disabled'
# XPath: .//li[not(contains(@class, "disabled"))]
query_enabled = E.li[F.not_(A.class_.contains("disabled"))]
# 5. Execute the query and process the list of results
enabled_items = root.descendants(query_enabled)
item_texts = enabled_items.map(lambda item: item.string())
print(f"\nEnabled items: {item_texts}")
# --- Output ---
# Enabled items: ['Item 1', 'Item 2']
Use the html() or xml() functions to start. They accept a string, bytes, or a file path.
from xpathkit import html, xml
# Parse an HTML string
root_html = html("<div><p>Hello</p></div>")
# Parse an XML file
root_xml = xml(path="data.xml")
These are the heart of xpath-kit, making expression building effortless.
E (Element): Builds element nodes. E.g., E.div, E.a, or custom tags E["my-tag"].A (Attribute): Builds attribute nodes within predicates. E.g., A.id, A.href, or custom attributes A["data-id"].F (Function): Builds XPath functions. E.g., F.contains(), F.not_(), F.position(), or any custom function: F["name"](arg1, ...).Note: Since class and for are reserved keywords in Python, use a trailing underscore: A.class_ and A.for_.
/ and //)Use the division operators to define relationships between elements.
/: Selects a direct child.//: Selects a descendant at any level.# Selects a <p> that is a direct child of a <div>
# XPath: div/p
query_child = E.div / E.p
# Selects an <a> that is a descendant of the <body>
# XPath: body//a
query_descendant = E.body // E.a
You can also use a string directly after an element for simple cases:
# Equivalent to E.div / E.span
query = E.div / "span"
This is convenient for simple queries without predicates or attributes.
[])Use square brackets [] on an element to add filtering conditions. This is where xpath-kit truly shines.
A# Find a div with id="main"
# XPath: //div[@id="main"]
query = E.div[A.id == "main"]
# Find an <a> that has an href attribute
# XPath: //a[@href]
query_has_href = E.a[A.href]
# Find an <li> whose class contains "item" but NOT "disabled"
# XPath: //li[contains(@class,"item") and not(contains(@class,"disabled"))]
query = E.li[A.class_.contains("item") & F.not_(A.class_.contains("disabled"))]
To query against the string value of a node (.), import the dot class.
from xpathkit import dot
# Find an <h1> whose text is exactly "Welcome"
# XPath: //h1[.="Welcome"]
query = E.h1[dot() == "Welcome"]
# Find a <p> whose text contains the word "paragraph"
# XPath: //p[contains(., "paragraph")]
query_contains = E.p[dot().contains("paragraph")]
FUse F to call any standard XPath function inside a predicate.
# Select the first list item
# XPath: //li[position()=1]
query_first = E.li[F.position() == 1]
# Select the last list item
# XPath: //li[last()]
query_last = E.li[F.last()]
& and |&: Logical and|: Logical or# Find an <a> with href="/home" AND a target attribute
# XPath: //a[@href="/home" and @target]
query_and = E.a[(A.href == "/home") & A.target]
# Find a <div> with id="sidebar" OR class="nav"
# XPath: //div[@id="sidebar" or contains(@class,"nav")]
query_or = E.div[(A.id == "sidebar") | A.class_.contains("nav")]
Important: Due to Python's operator precedence, it's highly recommended to wrap combined conditions in parentheses ().
Use integers (1-based) or negative integers (from the end) directly.
# Select the second <li>
# XPath: //li[2]
query = E.li[2]
# Select the last <li> (equivalent to F.last())
# XPath: //li[last()]
query_last = E.li[-1]
.child()/.descendant() return a single XPathElement..children()/.descendants() return an Union[XPathElementList, str, float, bool, List[str]].XPathElement (Single Result).tag: The element's tag name (e.g., 'div')..attr: A dictionary of all attributes.element['name']: Access an attribute directly..string(): Get the concatenated text of the element and all its children (string(.))..text(): Get a list of only the element's direct text nodes (./text())..parent(): Get the parent element..next_sibling() / .prev_sibling(): Get adjacent sibling elements..xpath(query): Execute a raw string or a constructed query within the context of this element.XPathElementList (Multiple Results).one(): Ensures the list contains exactly one element and returns it; otherwise, raises an error..first() / .last(): Get the first or last element; raises an error if the list is empty.len(element_list): Get the number of elements..filter(func): Filter the list based on a function..map(func): Apply a function to each element and return a list of the results.for e in my_list: ...my_list[0], my_list[-1]Modify the document tree with ease.
from xpathkit import XPathElement, E, A
# Assuming 'root' is a parsed XPathElement
# Find the <ul> element
ul = root.descendant(E.ul)
# Create and append a new <li>
new_li = XPathElement.create("li", attr={"class": "new-item"}, text="Item 4")
ul.append(new_li)
# Remove an element
item_to_remove = ul.child(E.li[A.class_.contains("disabled")])
if item_to_remove:
ul.remove(item_to_remove)
# Print the modified HTML
print(root.tostring())
This project is licensed under the MIT License. See the LICENSE file for details.
FAQs
A toolkit for convenient and expressive XPath operations based on lxml.
We found that xpath-kit demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.

Security News
Socket CTO Ahmad Nassri shares practical AI coding techniques, tools, and team workflows, plus what still feels noisy and why shipping remains human-led.

Research
/Security News
A five-month operation turned 27 npm packages into durable hosting for browser-run lures that mimic document-sharing portals and Microsoft sign-in, targeting 25 organizations across manufacturing, industrial automation, plastics, and healthcare for credential theft.