Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

tinsel

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

tinsel

PySpark schema generator

  • 0.3.0
  • PyPI
  • Socket score

Maintainers
1

tinsel

Your data IS your schema

.. image:: https://img.shields.io/pypi/pyversions/tinsel.svg :target: https://pypi.python.org/pypi/tinsel .. image:: https://img.shields.io/pypi/v/tinsel.svg :target: https://pypi.python.org/pypi/tinsel .. image:: https://coveralls.io/repos/github/Orhideous/tinsel/badge.svg?branch=master :target: https://coveralls.io/github/Orhideous/tinsel?branch=master .. image:: https://img.shields.io/travis/Orhideous/tinsel.svg :target: https://travis-ci.org/Orhideous/tinsel .. image:: https://pyup.io/repos/github/Orhideous/tinsel/shield.svg :target: https://pyup.io/repos/github/Orhideous/tinsel/

This tiny library helps to overcome excessive complexity in hand-written pyspark dataframe schemas.

How?

Shape your data as NamedTuple or dataclasses - they can freely mix::

from dataclasses import dataclass
from tinsel import struct, transform
from typing import NamedTuple, Optional, Dict, List

@struct
@dataclass
class UserInfo:
    hobby: List[str]
    last_seen: Optional[int]
    pet_ages: Dict[str, int]


@struct
class User(NamedTuple):
    login: str
    age: int
    active: bool
    info: Optional[UserInfo]

Transform root node (User in our case) into schema::

schema = transform(User)

Create some data, if necessary::

data = [
    User(
        login="Ben",
        age=18,
        active=False,
        info=None
    ),
    User(
        login="Tom",
        age=32,
        active=True,
        info=UserInfo(
            hobby=["pets", "flowers"],
            last_seen=16,
            pet_ages={"Jack": 2, "Sunshine": 6}
        )
    )
]

And… voilà!::

from pyspark.sql import SparkSession

sc = SparkSession.builder.master('local').getOrCreate()

df = sc.createDataFrame(data=data, schema=schema)
df.printSchema()
df.show(truncate=False)

This will output::

root
 |-- login: string (nullable = false)
 |-- age: integer (nullable = false)
 |-- active: boolean (nullable = false)
 |-- info: struct (nullable = true)
 |    |-- hobby: array (nullable = false)
 |    |    |-- element: string (containsNull = false)
 |    |-- last_seen: integer (nullable = true)
 |    |-- pet_ages: map (nullable = false)
 |    |    |-- key: string
 |    |    |-- value: integer (valueContainsNull = false)


+-----+---+------+----------------------------------------------+
|login|age|active|info                                          |
+-----+---+------+----------------------------------------------+
|Ben  |18 |false |null                                          |
|Tom  |32 |true  |[[pets, flowers],, [Jack -> 2, Sunshine -> 6]]|
+-----+---+------+----------------------------------------------+

Features

  • use native python types; no extra DSL, no cryptic API — just plain Python;
  • small and fast;
  • provide type shims for some types absent in Python, like long or short;
  • nullable fields naturally fits into schema definition;

Credits

This package was created with Cookiecutter_ and the audreyr/cookiecutter-pypackage_ project template.

.. _Cookiecutter: https://github.com/audreyr/cookiecutter .. _audreyr/cookiecutter-pypackage: https://github.com/audreyr/cookiecutter-pypackage

======= History

0.2.0 (2018-08-28)

  • Added dataclasses support

0.1.0 (2018-08-28)

  • First release on PyPI.

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc