Databricks Gem
Overview
This gem is designed to allow access to the DBX APIs (Jobs and SQL) from ruby applications.
Installation
Add the following to your Gemfile to install
gem 'dbx-api', '~>0.2.0'
Usage
Set up your .env file (optional)
DBX_HOST=DBX_CONNECTION_URL
DBX_TOKEN=YOUR_TOKEN_HERE
DBX_WAREHOUSE_ID=WAREHOUSE_ID_HERE
running sql from a ruby script
require 'dbx'
sql_runner = DatabricksGateway.new
sql_runner = DatabricksGateway.new(host: 'DBX_CONNECTION_STRING', token: 'DBX_ACCESS_TOKEN', warehouse: 'DBX_SQL_WAREHOUSE_ID')
response = sql_runner.run_sql("SELECT 1")
response.results
response = sql_runner.run_sql("SELECT * FROM samples.nyctaxi.trips LIMIT 1")
response.results
run_sql
returns an object of type DatabricksSQLResponse.
The response object has a few useful methods. For a complete list, see the class definition: lib/dbx/databricks/sql_response.rb
response = sql_runner.run_sql("SELECT 1")
response.status
response.failed?
response.success?
response.results
response.raw_response
response.body
response.error_message
This gem does not make an inference to how error handling should occur. run_sql
always returns an array, even if the query fails (it will return []
if status.failed?). Users may wish to check the status of the response before attempting to access the results. For example:
require 'dbx'
sql_runner = DatabricksGateway.new
res = sql_runner.run_sql("SELECT 1")
return res.results if res.success?
puts "query failed: #{res.error_message}"
The result of DatabrucksSQLResponse.results
is always a an array of hashes with string keys. You can map over this array using whatever ruby code you like. For example:
sql_response = sql_runnner.run_sql(my_query)
sql_response.results.length
If you would prefer a symbolized hash, you can use the symbolize_keys
option when calling results
:
sql_response.results(symbolize_keys: true)
Since run_sql
returns an instance of DatabricksSQLResponse
, you can also chain methods together:
sql_runner.run_sql("SELECT 1").results
The run_sql
method hanldes slow queries by pinging databricks at set time intervals (default 5 seconds) until the query either fails or succeeds. The default sleep_timer can be set to any desired time interval at initialization:
sql_runner = DatabricksGateway.new(sleep_timer: 1)
Development
- After checking out the repo, run
bin/setup
to install dependencies. - Set up your
.env
file as described above. - Run
rake spec
to run the rspec tests.
Build
- Run
gem build dbx.gemspec
to build the gem. - Run
gem push dbx-api-0.2.0.gem
to push the gem to rubygems.org
- Requires logging in to rubygems.org first via
gem login