Socket
Socket
Sign inDemoInstall

databricks-utils

Package Overview
Dependencies
0
Maintainers
1
Alerts
File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

    databricks-utils

Ease-of-use utility tools for databricks notebooks.


Maintainers
1

Readme

databricks-utils

Python version Pyspark version Build Status

databricks-utils is a python package that provide several utility classes/func that improve ease-of-use in databricks notebook.

Installation

pip install databricks-utils

Features

  • S3Bucket class to easily interact with a S3 bucket via dbfs and databricks spark.

  • vega_embed to render charts from Vega and Vega-Lite specifications.

Documentation

API documentation can be found at https://e2fyi.github.io/databricks-utils/.

Quick start

S3Bucket

import json
from databricks_utils.aws import S3Bucket

# need to attach notebook's dbutils
# before S3Bucket can be used
S3Bucket.attach_dbutils(dbutils)

# create an instance of the s3 bucket
bucket = (S3Bucket("somebucketname", "SOMEACCESSKEY", "SOMESECRETKEY")
          .allow_spark(sc) # local spark context
          .mount("somebucketname")) # mount location name (resolves as `/mnt/somebucketname`)

# show list of files/folders in the bucket "resource" folder
bucket.ls("resource/")

# read in a json file from the bucket
data = json.load(open(bucket.local("resource/somefile.json", "r")))

# read from parquet via spark
dataframe = spark.read.parquet(bucket.s3("resource/somedf.parquet"))

# umount
bucket.umount()

Vega
Vega and Vega-Lite are high-level grammars of interactive graphics. They provide concise JSON syntax for rapidly generating visualizations to support analysis.

from databricks_utils.vega import vega_embed

# vega-lite spec for a bar chart
spec = {
  "data": {
    "values": [
      {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
      {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
      {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "a", "type": "ordinal"},
    "y": {"field": "b", "type": "quantitative"}
  }
}

# plot out the vega chart in databricks notebook
displayHTML(vega_embed(spec=spec))

Developer

# add a version to git tag and publish to pypi
. add_tag.sh <VERSION>

FAQs


Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc