Lorien: A Hyper-Automated Tuning System for Tensor Operators
Lorien is a system built on the top of TVM to massively explore/benchmark the best schedule configs
of TOPI schedules.
Motivation
Although TVM already has a TOPI (TVM Operator Inventory) with the implementations of algorithm
and schedules for commonly used operators such as conv2d and dense, there is a challenge makes
TOPI hard to be improved efficiently.
The best schedule of TOPI is stored in
TopHub,
which is a JSON file in GitHub. However, it has the following problems.
-
Storing all schedules in a single text file has low accessibility and scalability.
Every time AutoTVM has to load an entire JSON file in order to find only one
schedule config for a workload.
-
The coverage of workloads and platforms are insufficient in the current version.
For example, the latest TopHub covers only 690 workloads for
CUDA backend,
including conv2, depthwise conv2d, and 5 GPU models.
-
Comparing to TVM that has several commits everyday, TopHub is not frequently updated.
As a result, some schedule configs are out-of-date and cannot achieve
good performance anymore.
Since it is impractical to use TVM CI to benchmark the performance for every pull request, we need
a separate system to regularly benchmark and update the stored schedule configs.
Commandline Interface and Example Usages
The system has a complete CLI with hierarchical commands. All commands can also be
specified in a config file in YAML format, and use a prefix "@" to expand them.
See the following examples for CLI usages, and configs/samples
for example configurations.
Note the the complete description of each command can be retrieved by the help command:
python3 -m lorien <commands> -h
- Extract workloads from a Gluon CV model.
python3 -m lorien generate extract gcv --model alexnet --target llvm
- Extract workloads from a TF model.
python3 -m lorien generate extract tf --model ./mobilenet.pb --target llvm
- Extract workloads from a Gluon CV model and mutate them to generate new workloads.
python3 -m lorien generate mutate modelzoo rules.yaml --model alexnet --target llvm
- Tune workloads with RPC servers.
# tune.yaml
rpc:
llvm -mcpu=skylake-avx512:
- localhost:18871
db:
endpoint_url:
http://localhost:10020
log-s3-bucket: saved-tuning-logs
ntrial: 3000
python3 -m lorien tune @tune.yaml @gcv_workloads_llvm.yaml
System Requirements
-
Python 3.6+
-
Amazon DynamoDB (local or aws): DynamoDB is used for storing and maintain the tuned schedules.
You can choose to either 1) launch a local version
and specify endpoint URL (e.g. --db "endpoint_url: http://<your IP>:8000"
), 2) or launch an AWS service, configure
AWS CLI in your machine, and specify the region name (e.g., --db "region_name: us-west-1"
) when invoking the tuning.
-
AWS S3 (optional): S3 is used to store the full tuning logs (JSON files generated by AutoTVM). This is an optional requirement,
so if you did not specify --log-s3-bucket bucket_name
, then the full tuning logs will not be uploaded but only the best
schedule config will be submitted to the DynamoDB.
Documentation
TBA