prefigure
Run-configuration management utils: combines configparser, argparse, and wandb.API
Capabilities for archiving run settings and pulling configurations from previous runs. With just 3 lines of code 😎 : the import, the arg setup, & the wandb push.
Combines argparse, configparser, and wandb.API. WandB logging is done via pytorch_lightning
's WandBLogger.
Install:
pip install prefigure
Instructions:
All your usual command line args (with the exception of --name
and --training-dir
) are now to be specified in a defaults.ini
file -- see examples/
for an example.
A different .ini
file can be specified via --config-file
.
Versions 0.0.9 and later: A .gin
can be instead be used for --config-file
, in which case the sytem only runs gin and nothing else.
The option --wandb-config <url>
pulls previous runs' configs off wandb, where <url> is the url of any one of your runs to override those defaults: e.g.
--wandb-config='https://wandb.ai/drscotthawley/delete-me/runs/1m2gh3o1?workspace=user-drscotthawley'`
(i.e., whatever URL you grab from your browser window when looking at an individual run.)
NOTE: the --wandb-config
thing can only pull from WandB runs that used prefigure, i.e. that have logged a "wandb config push".
Any command line args you specify will override any settings from WandB and/or the .ini
file.
The order of precedence is "command line args override WandB, which overrides the .ini file".
1st line to add
In your run/training code, add this near the top:
from prefigure import get_all_args, push_wandb_config
2nd line to add
Near the top of your main()
, add this:
args = get_all_args()
Further down in your code, comment-out (or delete) all your command-line arguments (e.g. ArgParse calls). If you want different command-line arguments, then add or change them in defaults.ini. The 'help' string for these is provided via comment in the line preceding your variable. See examples/defaults.ini for examples.
3rd line to add
and then right after you define the wandb logger, run
push_wandb_config(wandb_logger, args)
(Optional:) 4th & 5ths line to add: OFC
Starting with prefigure
v0.0.8, there is an On-the-Fly Control (OFC, pronounced like what you say when you realize you forget to set a variable properly).
This tracks any changes to arguments listed as "steerable" by logging to a separate file (by default ofc.ini
) and
updates those args dyanmically when changes to that file are made. It can also (optionally) log those changes to WandB (and when they occur); see sample usage below.
from prefigure import OFC
...
ofc = OFC(args, steerables=vars(args).keys())
or fancier: with the Gradio GUI, and only allowing OFC steering for certain variables (default is all are steerable), and only launch one GUI for a DDP PyTorch Lightning process:
ofc = OFC(args, gui=(trainer.global_rank==0), steerables=['lr','demo_every','demo_steps', 'num_demos','checkpoint_every'])
If the GUI is enabled, you get a Gradio URL, which is also pushed to wandb
(as "Media"). By default this URL is on localhost
, however,
if environment variables OFC_USERNAME
and OFC_PASSWORD
are set, then a temporary public Gradio is obtained. (Since these temporary public URLs expire after 72 hours, we re-launch the GUI every 71 hours and update the link on WandB.)
Also, if you set sliders=True
when calling OFC()
, the float and int variables will get sliders (with max & min guessed at by arg values). Otherwise, the default is that all variables (excep bool
types) are expressed via text fields.
Sample usage:
Here's a rough outline of some pytorch code. See examples/
for more.
import torch
import torch.utils.data as data
from prefigure import get_all_args, push_wandb_config, OFC
import pytorch_lightning as pl
import wandb
def main():
args = get_all_args()
ofc = OFC(args)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.manual_seed(args.seed)
train_set = SampleDataset([args.training_dir], args)
train_dl = data.DataLoader(train_set, args.batch_size, shuffle=True,
num_workers=args.num_workers, persistent_workers=True, pin_memory=True)
wandb_logger = pl.loggers.WandbLogger(project=args.name)
push_wandb_config(wandb_logger, args, omit=['training_dir'])
demo_dl = data.DataLoader(train_set, args.num_demos, shuffle=True)
...
if hasattr(args,'check_ofc_every') and (step > 0) and (step % args.check_ofc_every == 0):
changes_dict = ofc.update()
if {} != changes_dict:
wandb.log({'args/'+k:v for k,v in changes_dict.items()}, step=step)
if (step > 0) and (step % args.checkpoint_every == 0):...
lr = args.learning_rate
do_stuff(lr)
Imports & Other File Formats
prefigure
defaults to .ini files, but will also read .json and .gin files. It will also import
said files that are specified as values -- if these parameters are listed via a separate "imports" parameter, as in the following example:
$ cat examples/harmonai-tools.ini
[DEFAULTS]
# model config fle
model_config = ../../harmonai-tools/harmonai_tools/configs/model_configs/diffusion_autoencoders/seanet_32_32_diffae.json
# dataset config file
dataset_config = ../../harmonai-tools/harmonai_tools/configs/dataset_configs/s3_wds_example.json
imports = model_config, dataset_config
In this case, both args.model_config
and args.dataset_config
will have their filename value string replaced by the dict(s) specified in the .json files given. If they were not listed under imports
, then the filename value will remain and no import will occur.
Lightning
If you want to pass around the ofc
object deep inside other libraries, e.g., PyTorch Lightning, I've had success overloading Lightning's Trainer
object, e.g. trainer.ofc = ofc
. Then do something like module.ofc.update()
inside the training routine. For example, cf. my tweet about this.