You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

llm_xml_parser

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

llm_xml_parser

A minimal XML parser for structured output from LLM

0.1.3

PyPI

Maintainers: 1

XML Parser for Structured LLM Output

General Description

This parser implements a stack-based algorithm to analyze XML hierarchies, tracking parent-child relationships and structure depth. It offers flexible configurations for extracting specific tags (both single and lists) at defined depths, with a rigorous validation system. The implementation is optimized for small XML files (typically a few KB) with simple hierarchical structures (maximum 3-4 levels of depth).

Core Features

The parser handles:

Critical Errors: missing tags, unauthorized duplicates, empty lists
Non-Blocking Warnings: nested tags in unexpected structures, mixed XML/text content
Untagged Text Handling: preserves text not enclosed in XML tags
Case-Insensitive Tag Search: tag names are matched regardless of case.
Original Text Capitalization: output text retains the exact capitalization from the input.

Key Technical Features

Exclusive use of Python standard libraries (no external dependencies)
Case-insensitive only for tag names during search
Preservation of original text, including capitalization and formatting
Integrated logging system for warnings and errors

Use Cases and Behaviors

Important Note: The ParseResult structure is flat. All configured tags, including those defined as "children" in the configuration, are accessible as first-level attributes of the result object. There is no nested hierarchical structure.

✅ Correctly Handled Cases

Elements extracted without warnings or errors.

1. Single Element at Depth 0

XML:

<thinking>This is a test</thinking>

Config: {'thinking': 'single'} Output:

result.thinking → "This is a test"

2. List with 2+ Elements

XML:

<step>A</step>
<step>B</step>
<step>C</step>

Config: {'step': 'list'} Output:

result.step → ["A", "B", "C"]

3. Explicitly Configured Hierarchy

XML:

<exercise>
    <question>What is 2+2?</question>
    <answer>The answer is 4</answer>
</exercise>

Config:

config = {
    'exercise': {
        'type': 'single',
        'children': {
            'question': 'single',
            'answer': 'single'
        }
    }
}

Output:

result.exercise → "<question>What is 2+2?</question><answer>The answer is 4</answer>" 
result.question → "What is 2+2?" 
result.answer → "The answer is 4"

4. Correct Untagged Text Extraction

LLM Input:

<thinking>Analysis</thinking>
Answer: 42

After Pre-Processing:

<root>
    <thinking>Analysis</thinking>
    Answer: 42
</root>

Note: Preprocessing always adds an artificial root tag, since LLM output often does not include a valid root tag. The root tag is shown here only to illustrate the handling of unlabeled text.

Output:

result.thinking → "Analysis"
result.untagged → "Answer: 42"

⚠️ Warnings (Extraction with Notification)

Elements extracted with warnings in logs and accessible via ParseResult.warnings

1. Nested Tags in Single Element

XML:

<thinking>Use <formula>E=mc²</formula></thinking>

Config: {'thinking': 'single'} Output:

result.thinking → "Use <formula>E=mc²</formula>"
# Log and warnings: "Unconfigured tag <formula> found inside 'thinking'"

2. Unconfigured Tag in List

XML:

<step>
    A<detail>some info</detail>
</step>
<step>B</step>

Config: {'step': 'list'} Output:

result.step → ["A<detail>some info</detail>", "B"]
# Log and warnings: "Unconfigured tag <detail> found inside 'step'"

3. List with 1 Element

XML:

<step>Single step</step>

Config: {'step': 'list'} Output:

result.step → ["Single step"]
# Log and warnings: "List <step> contains only 1 element."

❌ Blocking Errors

Interrupt execution by raising specific exceptions.

1. Missing Single Element

XML: <answer>42</answer> Config: {'thinking': 'single'} Error:

XMLStructureError: Tag <thinking> not found (single required).

2. Too Many Single Elements

XML:

<answer>42</answer>
<answer>43</answer>

Config: {'answer': 'single'} Error:

XMLStructureError: Multiple <answer> found, but 'single' is required.

3. Empty List

XML: <answer>stuff</answer> Config: {'steps': 'list'} Error:

XMLStructureError: List <steps> is empty (1+ elements required).

4. Malformed XML

XML:

<thinking>Test<thinking>

<a><b></a></b>

Error:

XMLFormatError: Unclosed tags remain. Malformed XML structure.

XMLFormatError: Mismatched tags: opened <b>, but closed </a>

Unhandled Cases

The parser is designed to be minimalist and efficiently handle simple XML output from LLMs. The following cases are not supported:

Namespaces
- ❌ Invalid: <ns:thinking>Test</ns:thinking>
Tags with Special Characters
- ✅ Valid: <distro_linux>Test</distro_linux> (underscore supported)
- ❌ Invalid: <foo-bar>Test</foo-bar>
- ❌ Invalid: <foo bar>Test</foo bar>
CDATA and Complex Content
- CDATA sections not interpreted
- XML entities not processed
- ❌ Not handled: <thinking><![CDATA[<test>]]></thinking>
XML Attributes
- Tag attributes ignored
- ❌ Not handled: <tag attr="val">Test</tag>

Implementation Details

Execution Modes

Normal Mode (Default, strict_mode=False)
- Warnings logged but non-blocking
- Only errors generate exceptions
- Output preserved even with warnings
Strict Mode (strict_mode=True)
- All warnings become blocking errors
- Maximum validation rigor

Input Handling

Size: Optimized for small XML files (10-20 KB)
Depth: Efficient support up to 3-4 levels
Case Sensitivity:
- Tags: case-insensitive during search
- Content: preserved exactly as in input

Pre-Processing

Automatic removal of XML comments
Automatic root tag addition
Basic input normalization

Technical Requirements

Python: Version 3.7 or higher
Dependencies: Python standard library only
Encoding: UTF-8 for input/output

FAQs

What is llm_xml_parser?

Is llm_xml_parser well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

llm_xml_parser

XML Parser for Structured LLM Output

General Description

Core Features

Key Technical Features

Use Cases and Behaviors

✅ Correctly Handled Cases

1. Single Element at Depth 0

2. List with 2+ Elements

3. Explicitly Configured Hierarchy

4. Correct Untagged Text Extraction

⚠️ Warnings (Extraction with Notification)

1. Nested Tags in Single Element

2. Unconfigured Tag in List

3. List with 1 Element

❌ Blocking Errors

1. Missing Single Element

2. Too Many Single Elements

3. Empty List

4. Malformed XML

Unhandled Cases

Implementation Details

Execution Modes

Input Handling

Pre-Processing

Technical Requirements

Related posts

Introducing Scala and Kotlin Support in Socket

AI + a16z Podcast: Vibe Coding, Security Risks, and the Path to Progress