Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

xml-stream

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

xml-stream

A simple XML file and string reader to read big XML files and strings using iterators with optional conversion to dict

  • 0.0.8
  • PyPI
  • Socket score

Maintainers
1

xml_stream

A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators), with an option to convert to dictionaries

Description

xml_stream comprises two helper functions:

read_xml_file

When given a path to a file and the name of the tag that holds the relevant data, it returns an iterator of the data as xml.etree.ElementTree.Element object by default, or as dicts when to_dict argument is True

read_xml_string

When given an XML string and the name of the tag that holds the relevant data, it returns an iterator of the data as xml.etree.ElementTree.Element object by default, or as dicts when to_dict argument is True

Main Dependencies

  • Python +3.6

Getting Started

  • Install the package

    pip install xml_stream
    
  • Import the read_xml_file and the read_xml_string classes and use accordingly

    from xml_stream import read_xml_file, read_xml_string
    
    xml_string = """
    <company>
          <staff>
              <operations_department>
                  <employees>
                      <team>Marketing</team>
                      <location name="head office" address="Kampala, Uganda" />
                      <bio first_name="John" last_name="Doe">John Doe</bio>
                      <bio first_name="Jane" last_name="Doe">Jane Doe</bio>
                      <bio first_name="Peter" last_name="Doe">Peter Doe</bio>
                  </employees>
                  <employees>
                      <team>Customer Service</team>
                      <location name="Kampala branch" address="Kampala, Uganda" />
                      <bio first_name="Mary" last_name="Doe">Mary Doe</bio>
                      <bio first_name="Harry" last_name="Doe">Harry Doe</bio>
                      <bio first_name="Paul" last_name="Doe">Paul Doe</bio>
                  </employees>
              </operations_department>
          </staff>
    </company>
    """
    
    file_path = '...' # path to your XML file
    
    # For XML strings, use read_xml_string which returns an iterator  
    for element in read_xml_string(xml_string, records_tag='staff'):
        # returns the element as xml.etree.ElementTree.Element by default
        # ...do something with the element
        print(element)
    
    # Note that if a tag is namespaced with say _prefix:tag_ and domain is _xmlns:prefix="https://example",
    # the records_tag from that tag will be '{https://example}tag'
    for element_as_dict in read_xml_string(xml_string, records_tag='staff', to_dict=True):
        # returns the element as dictionary
        # ...do something with the element dictionary
        print(element_as_dict)
        # will print
        """
        {
              'operations_department': {
                  'employees': [
                      [
                          {
                              'team': 'Marketing',
                              'location': {
                                  'name': 'head office',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'John',
                              'last_name': 'Doe',
                              '_value': 'John Doe'
    
                          },
                          {
                              'team': 'Marketing',
                              'location': {
                                  'name': 'head office',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'Jane',
                              'last_name': 'Doe',
                              '_value': 'Jane Doe'
    
                          },
                          {
                              'team': 'Marketing',
                              'location': {
                                  'name': 'head office',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'Peter',
                              'last_name': 'Doe',
                              '_value': 'Peter Doe'
    
                          }, ],
                      [
                          {
                              'team': 'Customer Service',
                              'location': {
                                  'name': 'Kampala branch',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'Mary',
                              'last_name': 'Doe',
                              '_value': 'Mary Doe'
    
                          },
                          {
                              'team': 'Customer Service',
                              'location': {
                                  'name': 'Kampala branch',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'Harry',
                              'last_name': 'Doe',
                              '_value': 'Harry Doe'
    
                          },
                          {
                              'team': 'Customer Service',
                              'location': {
                                  'name': 'Kampala branch',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'Paul',
                              'last_name': 'Doe',
                              '_value': 'Paul Doe'
    
                          }
                      ],
                  ]
              }
        }
        """
    
    # For XML files (even really large ones), use read_xml_file which also returns an iterator  
    for element in read_xml_file(file_path, records_tag='staff'):
        # returns the element as xml.etree.ElementTree.Element by default
        # ...do something with the element
        print(element)
    
    for element_as_dict in read_xml_file(file_path, records_tag='staff', to_dict=True):
        # returns the element as dictionary
        # ...do something with the element dictionary
        print(element_as_dict)
        # see the print output for read_xml_string
    

How to test

  • Clone the repo and enter its root folder

    git clone https://github.com/sopherapps/xml_stream.git && cd xml_stream
    
  • Create a virtual environment and activate it

    virtualenv -p /usr/bin/python3.6 env && source env/bin/activate
    
  • Install the dependencies

    pip install -r requirements.txt
    
  • Download a huge xml file for test purposes and save it in the /test folder as huge_mock.xml

    wget http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/data/SwissProt/SwissProt.xml && mv SwissProt.xml test/huge_mock.xml
    
  • Run the test command

    python -m unittest
    

Acknowledgements

License

Copyright (c) 2020 Martin Ahindura Licensed under the MIT License

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc