
Research
Security News
The Growing Risk of Malicious Browser Extensions
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
A simple XML file and string reader to read big XML files and strings using iterators with optional conversion to dict
A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators), with an option to convert to dictionaries
xml_stream
comprises two helper functions:
When given a path to a file and the name of the tag that holds the relevant data, it returns an iterator
of the data as xml.etree.ElementTree.Element
object by default, or as dicts when to_dict
argument is True
When given an XML string and the name of the tag that holds the relevant data, it returns an iterator
of the data as xml.etree.ElementTree.Element
object by default, or as dicts when to_dict
argument is True
Install the package
pip install xml_stream
Import the read_xml_file
and the read_xml_string
classes and use accordingly
from xml_stream import read_xml_file, read_xml_string
xml_string = """
<company>
<staff>
<operations_department>
<employees>
<team>Marketing</team>
<location name="head office" address="Kampala, Uganda" />
<bio first_name="John" last_name="Doe">John Doe</bio>
<bio first_name="Jane" last_name="Doe">Jane Doe</bio>
<bio first_name="Peter" last_name="Doe">Peter Doe</bio>
</employees>
<employees>
<team>Customer Service</team>
<location name="Kampala branch" address="Kampala, Uganda" />
<bio first_name="Mary" last_name="Doe">Mary Doe</bio>
<bio first_name="Harry" last_name="Doe">Harry Doe</bio>
<bio first_name="Paul" last_name="Doe">Paul Doe</bio>
</employees>
</operations_department>
</staff>
</company>
"""
file_path = '...' # path to your XML file
# For XML strings, use read_xml_string which returns an iterator
for element in read_xml_string(xml_string, records_tag='staff'):
# returns the element as xml.etree.ElementTree.Element by default
# ...do something with the element
print(element)
# Note that if a tag is namespaced with say _prefix:tag_ and domain is _xmlns:prefix="https://example",
# the records_tag from that tag will be '{https://example}tag'
for element_as_dict in read_xml_string(xml_string, records_tag='staff', to_dict=True):
# returns the element as dictionary
# ...do something with the element dictionary
print(element_as_dict)
# will print
"""
{
'operations_department': {
'employees': [
[
{
'team': 'Marketing',
'location': {
'name': 'head office',
'address': 'Kampala, Uganda'
},
'first_name': 'John',
'last_name': 'Doe',
'_value': 'John Doe'
},
{
'team': 'Marketing',
'location': {
'name': 'head office',
'address': 'Kampala, Uganda'
},
'first_name': 'Jane',
'last_name': 'Doe',
'_value': 'Jane Doe'
},
{
'team': 'Marketing',
'location': {
'name': 'head office',
'address': 'Kampala, Uganda'
},
'first_name': 'Peter',
'last_name': 'Doe',
'_value': 'Peter Doe'
}, ],
[
{
'team': 'Customer Service',
'location': {
'name': 'Kampala branch',
'address': 'Kampala, Uganda'
},
'first_name': 'Mary',
'last_name': 'Doe',
'_value': 'Mary Doe'
},
{
'team': 'Customer Service',
'location': {
'name': 'Kampala branch',
'address': 'Kampala, Uganda'
},
'first_name': 'Harry',
'last_name': 'Doe',
'_value': 'Harry Doe'
},
{
'team': 'Customer Service',
'location': {
'name': 'Kampala branch',
'address': 'Kampala, Uganda'
},
'first_name': 'Paul',
'last_name': 'Doe',
'_value': 'Paul Doe'
}
],
]
}
}
"""
# For XML files (even really large ones), use read_xml_file which also returns an iterator
for element in read_xml_file(file_path, records_tag='staff'):
# returns the element as xml.etree.ElementTree.Element by default
# ...do something with the element
print(element)
for element_as_dict in read_xml_file(file_path, records_tag='staff', to_dict=True):
# returns the element as dictionary
# ...do something with the element dictionary
print(element_as_dict)
# see the print output for read_xml_string
Clone the repo and enter its root folder
git clone https://github.com/sopherapps/xml_stream.git && cd xml_stream
Create a virtual environment and activate it
virtualenv -p /usr/bin/python3.6 env && source env/bin/activate
Install the dependencies
pip install -r requirements.txt
Download a huge xml file for test purposes and save it in the /test
folder as huge_mock.xml
wget http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/data/SwissProt/SwissProt.xml && mv SwissProt.xml test/huge_mock.xml
Run the test command
python -m unittest
Copyright (c) 2020 Martin Ahindura Licensed under the MIT License
FAQs
A simple XML file and string reader to read big XML files and strings using iterators with optional conversion to dict
We found that xml-stream demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Research
Security News
An in-depth analysis of credential stealers, crypto drainers, cryptojackers, and clipboard hijackers abusing open source package registries to compromise Web3 development environments.
Security News
pnpm 10.12.1 introduces a global virtual store for faster installs and new options for managing dependencies with version catalogs.