Data Access Platform CLI
CLI for the Data Access Platform service
Installing
npm install data-access-platform-cli
Setup
You will need to either access or create a developer key and create a
config file.
Developer Key
Go to https://<account>.instructure.com/accounts/self/developer_keys
and see
if there is already a developer key provisioned for Canvas Data 2 with the
following settings:
- The key enforces scopes
- Only the cd2 peer service is checked
- The client credentials audience is set to Peer Service.
If there is not a key already created for Canvas Data 2, create a new API key
by following these steps:
Under the Account tab of the Developer Keys page, press the
+ Developer Key button and choose the + API Key option.
This will show a view for configuring a new key.
- Turn ON the Enforce Scopes toggle in this view.
- Check the cd2 peer service. You can navigate to Peer Services in
the main section of this view or you can quickly filter the list by
searching for “Peer Services” in the Search endpoints feature near
the top.
- Set the Client Credentials Audience to Peer Service.
- Give the new API key a name. We suggest Canvas Data 2.
- Save the key.
This will return you to the Account tab of the Developer Keys page.
Enable the key by selecting ON under the State.
Under the Details of the developer key, you’ll need the key ID (the number
with about 11 digits that is showing already), and the key secret which you
can see when you press the Show Key button.
You’ll put these two pieces of information into the config file after you
install DAP CLI.
Beware: these credentials can be used by anyone who has them to use this
CLI to download information from your Canvas instance.
Config file
Fill in your details in config.json
following the example provided by
config.example.json
. While you can specify a different path to config with the
--config
or -c
flags, by default it reads from config.json
.
You will need to change at least three values in your config.json
file.
- Change account subdomain (for
url
key in the value
https://<account>.instructure.com/login/oauth2/token
) - Change dev id (
client_id
) - Change dev secret (
client_secret
)
You may also need to change your baseURL
.
Usage
To use:
dap <command> [options]
Shared Options
version
dap --version
config
dap --config myconfig.json
help
dap --help
or dap -h
Getting help for a given command, this will list the command specific options:
dap --help <command>
logging
There are different levels of logging:
dap <command> [options]
dap <command> [options] -v
dap <command> [options] -vv
Available Commands
Snapshot
This command will get the most recent version of all the requested tables. For
example, to grab users
, accounts
, and courses
:
dap snapshot users accounts courses
The default output location is the snapshots/
directory, but that can be
overridden with the --output
option.
The number of concurrent tables fetched at a time can be controlled with the
--concurrency
flag. Example:
dap snapshot users accounts courses submissions --concurrency 2
The downloaded file has the following format:
<table name>_<current date>.<file format>
The current date is the local time when the user runs the snapshot
command.
dap snapshot users
Specific options:
Name | Description | Default value |
---|
concurrency | The number of tables to fetch concurrently | 10 |
format | The output format of data. Supported formats: csv, json | csv |
output | The directory to download to | snapshots |
filter | SQL query utilized by S3 select | |
Updates
This command will get changes to the specified tables since the provided time.
This time can be provided in two different ways: 1) an amount of time relative
to now, and 2) and absolute time.
dap updates users --last 4h
dap updates users --since '2020-02-21T09:03:00Z'
The relative time accepts many formats, including a number followed by:
m
for minutesh
for hoursd
for days
By default, it will grab updates from the last 24 hours.
The default output location is the updates/
directory, but that can be
overridden with the --output
option.
The number of concurrent diffs fetched at a time can be controlled with the
--concurrency
flag.
The downloaded file has the following format:
<table name>_<since date>_<current date>.<file format>
Both since and current dates are local times.
dap updates accounts --last 20d
Specific options:
Name | Description | Default value |
---|
concurrency | The number of tables to fetch concurrently | 10 |
last | The relative age for the oldest change in the query | 24h |
format | The output format of data. Supported formats: csv, json | csv |
output | The directory to download to | updates |
since | The ISO8601 date for the oldest change in the query (overrides --last) | |
until | The ISO8601 date for the newest change in the query | |
filter | SQL query utilized by S3 select | |
new_only | Fetch updates not yet received | boolean: not set |
collapse | Collapse multiple changes into one | boolean: not set |
Options of boolean type should be provided without a value or just skipped
(not set).
Schema
This command will get information about the schema.
dap schema --list
dap schema users courses accounts
dap schema
Specific options:
Name | Description | Default value |
---|
list | List all the tables rather than dump schema | boolean: not set |
Latency
This command will get latency state of all the requested tables. For
example, to grab users
, accounts
, and courses
:
dap latency users accounts courses
Specific options:
Name | Description | Default value |
---|
concurrency | The number of tables to fetch concurrently | 10 |
S3 select
In order to perform some operations you might need to filter output.
To filter, provide SQL statement to snapshot
/updates
cli command.
./bin/dap updates wikis -f '<your SQL statement>'
Query structure
All rules from S3 select SQL are applicable and should be considered.
Important precautions specific to CD2:
- From perspective S3 SQL you can make selection of fields you needed, but in
order to cli work correctly always include
cdcMetadata
in output and
make subselection only from row
(but also select it as a row
identifier).
This is because output is used by CLI. Ideally use SELECT *
. - Use JSON file SQL reference.
- Use filtering only when needed, e.g.
./bin/dap updates users -f "SELECT * FROM S3Object"
works much slower than
same command without filter (./bin/dap updates users
)
S3 file structure
In order to utilize S3 select you should know the file schema. On the top level
we have 2 separate fields: cdcMetadata
and row
. In output files structure
is different a bit, but for queries you need to consider this format.
cdcMetadata
This is a service field which has identical structure across all tables and
stores CDC data. In output files this field is named metadata
and has
a slightly different shape. The structure is:
table:string - name of the table
orderid:string - incremental counter. Unique per each event, monotonously increasing each time new update submitted
key: {
id:bigint - primary key value of the row associated with event
__dbz__physicaltableidentifier:string - internal id for debezium
}
deletion:boolean - indicator of row deletion
ts_ms:bigint - timestamp when change occured
root_account_uuid:string - account uuid, will be the same across all file
shard_id:int - database shard where event occurred
row
Row structure is the same as in output. You can get row's schema using
./bin/dap schema <tableName>
where tableName
is the name of the table you want a schema for.
Examples
Some valid examples of queries:
SELECT * FROM S3Object
SELECT * FROM S3Object[*] s WHERE s.row.id = '3'
Examples of cli usage
./bin/dap updates wikis -f 'SELECT * FROM S3Object[*]'
Developing
Adding a new command
The CLI uses yargs commandDir to make it easy
to add new commands. Add a new command by creating a new module in
lib/commands/
. This module should contain:
exports.command
: string or array of strings that contains the command
exports.describe
: string with the description of the command
exports.builder
: object containing the command options or a function
accepting and returning a yargs instance
exports.handler
: function using the parsed argv
This structure assumes all modules in the commands
directory are command
modules. Any supporting files need to be in a different directory. See
snapshot.ts for an example command and
snapshot.test.ts for example tests.
See the Providing a Command Module
docs for more details on these exports and the .commandDir(directory, [opts])
docs for more details about using a command module directory and more advanced
options.
Running tests
Using a docker container you can use:
./build.sh
or without you can run:
npm run test
npm run lint
npm run lint:md