Go to https://<account>.instructure.com/accounts/self/developer_keys and see if there is already a developer key provisioned for Canvas Data 2 with the following settings:

The key enforces scopes
Only the cd2 peer service is checked
The client credentials audience is set to Peer Service.

If there is not a key already created for Canvas Data 2, create a new API key by following these steps:

Under the Account tab of the Developer Keys page, press the + Developer Key button and choose the + API Key option.

This will show a view for configuring a new key.

Turn ON the Enforce Scopes toggle in this view.
Check the cd2 peer service. You can navigate to Peer Services in the main section of this view or you can quickly filter the list by searching for “Peer Services” in the Search endpoints feature near the top.
Set the Client Credentials Audience to Peer Service.
Give the new API key a name. We suggest Canvas Data 2.
Save the key.

This will return you to the Account tab of the Developer Keys page.

Enable the key by selecting ON under the State.

Under the Details of the developer key, you’ll need the key ID (the number with about 11 digits that is showing already), and the key secret which you can see when you press the Show Key button.

You’ll put these two pieces of information into the config file after you install DAP CLI.

Beware: these credentials can be used by anyone who has them to use this CLI to download information from your Canvas instance.

Config file

Fill in your details in config.json following the example provided by config.example.json. While you can specify a different path to config with the --config or -c flags, by default it reads from config.json.

You will need to change at least three values in your config.json file.

Change account subdomain (for url key in the value https://<account>.instructure.com/login/oauth2/token)
Change dev id (client_id)
Change dev secret (client_secret)

You may also need to change your baseURL.

Usage

To use:

dap <command> [options]

Shared Options

version

dap --version

config

dap --config myconfig.json

help

dap --help or dap -h

Getting help for a given command, this will list the command specific options:

dap --help <command>

logging

There are different levels of logging:

dap <command> [options]     # 'info'    logging level by default
dap <command> [options] -v  # 'verbose' logging level
dap <command> [options] -vv # 'debug'   logging level

Available Commands

Snapshot

This command will get the most recent version of all the requested tables. For example, to grab users, accounts, and courses:

dap snapshot users accounts courses

The default output location is the snapshots/ directory, but that can be overridden with the --output option.

The number of concurrent tables fetched at a time can be controlled with the --concurrency flag. Example:

dap snapshot users accounts courses submissions --concurrency 2

The downloaded file has the following format:

<table name>_<current date>.<file format>

The current date is the local time when the user runs the snapshot command.

# The time now is 11:49 PM. Today is the 24th of March, 2020
dap snapshot users

# The downloaded file will be:
# snapshots/users_2020-03-24-23-49-03.csv

Specific options:

Name	Description	Default value
concurrency	The number of tables to fetch concurrently	10
format	The output format of data. Supported formats: csv, json	csv
output	The directory to download to	snapshots
filter	SQL query utilized by S3 select

Updates

This command will get changes to the specified tables since the provided time. This time can be provided in two different ways: 1) an amount of time relative to now, and 2) and absolute time.

# This grabs changes from the last 4h
dap updates users --last 4h
# This grabs changes since a specific time
dap updates users --since '2020-02-21T09:03:00Z'

The relative time accepts many formats, including a number followed by:

m for minutes
h for hours
d for days

By default, it will grab updates from the last 24 hours.

The default output location is the updates/ directory, but that can be overridden with the --output option.

The number of concurrent diffs fetched at a time can be controlled with the --concurrency flag.

The downloaded file has the following format:

<table name>_<since date>_<current date>.<file format>

Both since and current dates are local times.

# The time now is 12:34 AM. Today is the 14th of May, 2020
dap updates accounts --last 20d

# The downloaded file will be:
# updates/accounts_2020-04-24-00-34-28_2020-05-14-00-34-39.csv

Specific options:

Name	Description	Default value
concurrency	The number of tables to fetch concurrently	10
last	The relative age for the oldest change in the query	24h
format	The output format of data. Supported formats: csv, json	csv
output	The directory to download to	updates
since	The ISO8601 date for the oldest change in the query (overrides --last)
until	The ISO8601 date for the newest change in the query
filter	SQL query utilized by S3 select
new_only	Fetch updates not yet received	boolean: not set
collapse	Collapse multiple changes into one	boolean: not set

Options of boolean type should be provided without a value or just skipped (not set).

Schema

This command will get information about the schema.

# List the available tables
dap schema --list
# Get the schema for some tables
dap schema users courses accounts
# Get the schema for all the tables
dap schema

Specific options:

Name	Description	Default value
list	List all the tables rather than dump schema	boolean: not set

Latency

This command will get latency state of all the requested tables. For example, to grab users, accounts, and courses:

dap latency users accounts courses

Specific options:

Name	Description	Default value
concurrency	The number of tables to fetch concurrently	10

S3 select

In order to perform some operations you might need to filter output. To filter, provide SQL statement to snapshot/updates cli command.

./bin/dap updates wikis -f '<your SQL statement>'

Query structure

All rules from S3 select SQL are applicable and should be considered. Important precautions specific to CD2:

From perspective S3 SQL you can make selection of fields you needed, but in order to cli work correctly always include cdcMetadata in output and make subselection only from row(but also select it as a row identifier). This is because output is used by CLI. Ideally use SELECT *.
Use JSON file SQL reference.
Use filtering only when needed, e.g. ./bin/dap updates users -f "SELECT * FROM S3Object" works much slower than same command without filter (./bin/dap updates users)

S3 file structure

In order to utilize S3 select you should know the file schema. On the top level we have 2 separate fields: cdcMetadata and row. In output files structure is different a bit, but for queries you need to consider this format.

cdcMetadata

This is a service field which has identical structure across all tables and stores CDC data. In output files this field is named metadata and has a slightly different shape. The structure is:

  table:string - name of the table
  orderid:string - incremental counter. Unique per each event, monotonously increasing each time new update submitted
  key: {
    id:bigint - primary key value of the row associated with event
    __dbz__physicaltableidentifier:string - internal id for debezium
  }
  deletion:boolean - indicator of row deletion
  ts_ms:bigint - timestamp when change occured
  root_account_uuid:string - account uuid, will be the same across all file
  shard_id:int - database shard where event occurred

row

Row structure is the same as in output. You can get row's schema using

  ./bin/dap schema <tableName>

where tableName is the name of the table you want a schema for.

Examples

Some valid examples of queries:

SELECT * FROM S3Object
SELECT * FROM S3Object[*] s WHERE s.row.id = '3'

Examples of cli usage

./bin/dap updates wikis -f 'SELECT * FROM S3Object[*]'

Developing

Adding a new command

The CLI uses yargs commandDir to make it easy to add new commands. Add a new command by creating a new module in lib/commands/. This module should contain:

exports.command: string or array of strings that contains the command

exports.describe: string with the description of the command

exports.builder: object containing the command options or a function accepting and returning a yargs instance

exports.handler: function using the parsed argv

This structure assumes all modules in the commands directory are command modules. Any supporting files need to be in a different directory. See snapshot.ts for an example command and snapshot.test.ts for example tests.

See the Providing a Command Module docs for more details on these exports and the .commandDir(directory, [opts]) docs for more details about using a command module directory and more advanced options.

Running tests

Using a docker container you can use:

./build.sh

or without you can run:

npm run test
npm run lint
npm run lint:md

FAQs

What is data-access-platform-cli?

Is data-access-platform-cli popular?

Is data-access-platform-cli well maintained?

Package last updated on 17 Jan 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

data-access-platform-cli