Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

embulk-parser-xpath2

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

embulk-parser-xpath2

  • 0.2.1
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

Xml parser plugin for Embulk

Gem Version Build Status Codacy Badge CodeFactor Known Vulnerabilities

Embulk parser plugin for parsing xml data by XPath perfectly!

Features

  • namespace awareness
  • nullable columns
  • complex json array columns (with restrictions)

Overview

  • Plugin type: parser
  • Guess supported: no

Configuration

  • type: specify this plugin as "xpath2" (string, required)
  • root: root element to start fetching each entries (string, required)
  • schema: specify the attribute of table and data type (required)
  • namespaces: specify namespaces (required)
  • stop_on_invalid_record: stop bulk load transaction if a invalid record is found (boolean, default is false)

Example

parser:
  type: xpath2
  root: '/ns1:root/ns2:entry'
  schema:
    - { path: 'ns2:id', name: id, type: long }
    - { path: 'ns2:title', name: title, type: string }
    - { path: 'ns2:meta/ns2:author', name: author, type: string }
    - { path: 'ns2:date', name: date, type: timestamp, format: '%Y%m%d' }
    - { path: 'ns2:ratings/ns2:rating[@by="subscribers"]', name: ratings, type: json }
  namespaces: {ns1: 'http://example.com/ns1/', ns2: 'http://example.com/ns2/'}

Then you can fetch entries from the following xml:

<?xml version="1.0"?>
<ns1:root
  xmlns:ns1="http://example.com/ns1/"
  xmlns:ns2="http://example.com/ns2/">
  <ns2:entry>
    <ns2:id>1</ns2:id>
    <ns2:title>Hello!</ns2:title>
    <ns2:meta>
      <ns2:author>maji-KY</ns2:author>
    </ns2:meta>
    <ns2:date>20010101</ns2:date>
    <ns2:ratings>
      <ns2:rating by="subscribers">1</ns2:rating>
      <ns2:rating by="subscribers">2</ns2:rating>
      <ns2:rating>3</ns2:rating>
    </ns2:ratings>
  </ns2:entry>
</ns1:root>

complex json array column

Usage

parser:
  type: xpath2
  root: '/ns1:root/ns2:entry'
  schema:
    - { path: 'ns2:id', name: id, type: long }
    - path: 'ns2:list'
      name: list
      type: json
      structure: # adding structure key to enabling complex json array column
        - path: 'ns2:list'
          name: list
          type: array
        - path: 'ns2:list/ns2:elements'
          name: elements
          type: array
        - path: 'ns2:list/ns2:elements/ns2:name'
          name: elementName
          type: string
        - path: 'ns2:list/ns2:elements/ns2:value'
          name: elementValue
          type: long
        - path: 'ns2:list/ns2:elements/ns2:active'
          name: elementActive
          type: boolean
  namespaces: {ns1: 'http://example.com/ns1/', ns2: 'http://example.com/ns2/'}

Structure configuration

  • path: specify path from the XPath of the column (string, required)
  • name: json key name (string)
  • type: json data type (One of array, string, long, boolean., required)

Then you can fetch entries from the following xml:

<?xml version="1.0"?>
<ns1:root
        xmlns:ns1="http://example.com/ns1/"
        xmlns:ns2="http://example.com/ns2/">
    <ns2:entry>
        <ns2:id>1</ns2:id>
        <ns2:list>
            <ns2:elements>
                <ns2:name>foo1</ns2:name>
                <ns2:value>1</ns2:value>
                <ns2:active>true</ns2:active>
            </ns2:elements>
            <ns2:elements>
                <ns2:name>foo2</ns2:name>
                <ns2:value>2</ns2:value>
                <ns2:active>false</ns2:active>
            </ns2:elements>
        </ns2:list>
        <ns2:list>
            <ns2:elements>
                <ns2:name>bar1</ns2:name>
                <ns2:value>3</ns2:value>
                <ns2:active>true</ns2:active>
            </ns2:elements>
        </ns2:list>
    </ns2:entry>
</ns1:root>

result of list column:

{
  "list": [
    {
      "elements": [
        {
          "elementActive": true,
          "elementName": "foo1",
          "elementValue": 1
        },
        {
          "elementActive": false,
          "elementName": "foo2",
          "elementValue": 2
        }
      ]
    },
    {
      "elements": [
        {
          "elementActive": true,
          "elementName": "bar1",
          "elementValue": 3
        }
      ]
    }
  ]
}

Build

$ ./gradlew gem

Benchmark

$ sbt benchmark/jmh:run

FAQs

Package last updated on 12 Jan 2019

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc