InstDataShipper
This gem is intended to facilitate easy upload of LTI datasets to Instructure Hosted Data.
Installation
Add this line to your application's Gemfile:
gem 'inst_data_shipper'
Then run the migrations:
bundle exec rake db:migrate
Usage
Dumper
The main tool provided by this Gem is the InstDataDumper::Dumper
class. It is used to define a "Dump" which is a combination of tasks and schema.
It is a assumed that a Dumper
class definition is the source of truth for all tables that it manages, and that no other processes affect the tables' data or schema. You can break this assumption, but you should understand how the incremental
logic works and what will and will not trigger a full table upload. Dumpers have a export_genre
method that determines the what Dumps to look at when calculating incrementals.
- High level, the HD backend will look for a past dump of the same genre. If not found, a full upload of all tables is triggered. If found, each table's schema is compared; any tables with mismatched schema (determined by hashing) will do a full upload.
- Note that
Proc
s in the schema are not included in the hash calculation. If you change a Proc
implementation and need to trigger a full-upload of the table, you'll need to change something else too (like the version
).
Here is an example Dumper
implementation, wrapped in an ActiveJob job:
class HostedDataPushJob < ApplicationJob
SCHEMA = InstDataShipper::SchemaBuilder.build do
extend_table_builder do
def custom_column(*args, from: nil, **kwargs, &blk)
from ||= args[0].to_s
from = ->(row) { row.data[from] } if from.is_a?(String)
column(*args, **kwargs, from: from, &blk)
end
include SomeConcern
end
table(ALocalModel, "<TABLE DESCRIPTION>") do
incremental "updated_at", on: [:id], if: ->() {}
source :local_table
source ->(table_def) { import_local_table(table_def[:model] || table_def[:warehouse_name]) }
version "1.0.0"
column :name_in_destinations, :maybe_optional_sql_type, "Optional description of column"
column :name, :"varchar(128)"
column :sis_type, :"varchar(32)", from: :some_model_method
column :sis_type, :"varchar(32)", from: "sis_source_type"
column :sis_type, :"varchar(32)", from: ->(rec) { ... }
column :sis_type, :"varchar(32)"
end
table("my_table", model: ALocalModel) do
end
table("proserv_student_submissions_csv") do
column :canvas_id, :bigint, from: "canvas user id"
column :sis_id, :"varchar(64)", from: "sis user id"
column :name, :"varchar(64)", from: "user name"
column :submission_id, :bigint, from: "submission id"
end
end
Dumper = InstDataShipper::Dumper.define(schema: SCHEMA, include: [
InstDataShipper::DataSources::LocalTables,
InstDataShipper::DataSources::CanvasReports,
]) do
import_local_table(ALocalModel)
import_canvas_report_by_terms("proserv_student_submissions_csv", terms: Term.all.pluck(:canvas_id))
import_local_table(SomeModel, schema_name: "my_table")
import_canvas_report_by_terms("some_report", terms: Term.all.pluck(:canvas_id), schema_name: "my_table")
auto_enqueue_from_schema
end
def perform
Dumper.perform_dump([
"hosted-data://<JWT>@<HOSTED DATA SERVER>?table_prefix=example",
"s3://<access_key_id>:<access_key_secret>@<region>/<bucket>/<path>",
])
end
end
Dumper
s may also be formed as a normal Ruby subclass:
class HostedDataPushJob < ApplicationJob
SCHEMA = InstDataShipper::SchemaBuilder.build do
end
class Dumper < InstDataShipper::Dumper
include InstDataShipper::DataSources::LocalTables
include InstDataShipper::DataSources::CanvasReports
def enqueue_tasks
import_local_table(ALocalModel)
import_canvas_report_by_terms("proserv_student_submissions_csv", terms: Term.all.pluck(:canvas_id))
end
def table_schemas
SCHEMA
end
end
def perform
Dumper.perform_dump([
"hosted-data://<JWT>@<HOSTED DATA SERVER>?table_prefix=example",
"s3://<access_key_id>:<access_key_secret>@<region>/<bucket>/<path>",
])
end
end
Destinations
This Gem is mainly designed for use with Hosted Data, but it tries to abstract that a little to allow for other destinations/backends. Out of the box, support for Hosted Data and S3 are included.
Destinations are passed as URI-formatted strings. Passing Hashes is also supported, but the format/keys are destination specific.
Destinations blindly accept URI Fragments (the #
chunk at the end of the URI). These options are not used internally but will be made available as dest.user_config
. Ideally these are in the same format as query parameters (x=1&y=2
, which it will try to parse into a Hash), but it can be any string.
Hosted Data
hosted-data://<JWT>@<HOSTED DATA SERVER>
Optional Parameters:
table_prefix
: An optional string to prefix onto each table name in the schema when declaring the schema in Hosted Data
S3
s3://<access_key_id>:<access_key_secret>@<region>/<bucket>/<optional path>
Optional Parameters:
None
Development
When adding to or updating this gem, make sure you do the following:
- Update the yardoc comments where necessary, and confirm the changes by running
yardoc --server
- Write specs
- If you modify the model or migration templates, run
bundle exec rake update_test_schema
to update them in the Rails Dummy application (and commit those changes)
Docs
Docs can be generated using yard. To view the docs:
- Clone this gem's repository
bundle install
yard server --reload
The yard server will give you a URL you can visit to view the docs.