kblaunch

A CLI tool for launching Kubernetes jobs with environment variable and secret management.
Installation
Using uv (recommended)
curl -LsSf https://astral.sh/uv/install.sh | sh
Alternatively, you can install uv
using pip:
pip install uv
- Use
uvx
to use the cli (the uvx
command invokes a tool without installing it to the local .venv):
uvx kblaunch --help
When using the kblaunch
command always prepend with uvx
command.
Usage
Setup
Run the setup command to configure the tool (email and slack webhook):
uvx kblaunch setup
This will go through the following steps:
- Set the user (optional): This is used to identify the user and required by the cluster. The default is set to $USER.
- Set the email (required): This is used to identify the user and required by the cluster.
- Set up Slack notifications (optional): This will send a test message to the webhook, and setup the webhook in the config. When your job starts you will receive a message at the webhook. Note a slack webhook is also required for automatic vscode tunnelling.
- Set up a PVC (optional): This will create a PVC for the user to use in their jobs.
- Set the default PVC to use (optional): Note only one pod can use the PVC at a time. The default pvc will be passed to the job. The pvc will always be mounted at
/pvc
.
- Set up git credentials (optional): If the user has set up a git/rsa key on the head node. We can export it as a secret for them and automatically load it and setup git credentials in their launched pods. This requires having setup git/rsa credentials before hand.
The outcome of kblaunch setup
is a .json
file stored in `.cache/.kblaunch/config.json. It should look something like this:
{
"email": "XXX@ed.ac.uk",
"user": "sXXX-infk8s",
"slack_webhook": "https://hooks.slack.com/services/XXX/XXX/XXX",
"default_pvc": "sXXX-infk8s-pvc",
"git_secret": "sXXX-infk8s-git-ssh"
}
When you later use kblaunch
to launch a job, it will use the values stored in that config.json.
Basic Usage
Launch a simple job:
uvx kblaunch launch
--job-name myjob \
--command "python script.py"
With Environment Variables
-
From local environment:
export PATH=...
export OPENAI_API_KEY=...
kblaunch launch \
--job-name myjob \
--command "python script.py" \
--local-env-vars PATH,OPENAI_API_KEY
-
From Kubernetes secrets:
uvx kblaunch launch \
--job-name myjob \
--command "python script.py" \
--secrets-env-vars mysecret1,mysecret2
-
From .env file (default behavior):
uvx kblaunch launch \
--job-name myjob \
--command "python script.py" \
--load-dotenv
If a .env exists in the current directory, it will be loaded and passed as environment variables to the job.
GPU Jobs
Specify GPU requirements:
uvx kblaunch launch \
--job-name gpu-job \
--command "python train.py" \
--gpu-limit 2 \
--gpu-product "NVIDIA-A100-SXM4-80GB"
Interactive Mode
Launch an interactive job:
uvx kblaunch launch \
--job-name interactive \
--interactive
Launch Options
Launch command options:
--email
: User email (overrides config)
--job-name
: Name of the Kubernetes job [required]
--docker-image
: Docker image (default: "nvcr.io/nvidia/cuda:12.0.0-devel-ubuntu22.04")
--namespace
: Kubernetes namespace (default: $KUBE_NAMESPACE)
--queue-name
: Kueue queue name (default: $KUBE_QUEUE_NAME)
--interactive
: Run in interactive mode (default: False)
--command
: Command to run in the container [required if not interactive]
--cpu-request
: CPU request (default: "1")
--ram-request
: RAM request (default: "8Gi")
--gpu-limit
: GPU limit (default: 1)
--gpu-product
: GPU product type (default: "NVIDIA-A100-SXM4-40GB")
- Available options:
- NVIDIA-A100-SXM4-80GB
- NVIDIA-A100-SXM4-40GB
- NVIDIA-A100-SXM4-40GB-MIG-3g.20gb
- NVIDIA-A100-SXM4-40GB-MIG-1g.5gb
- NVIDIA-H100-80GB-HBM3
--secrets-env-vars
: List of secret environment variables (default: [])
--local-env-vars
: List of local environment variables (default: [])
--load-dotenv
: Load environment variables from .env file (default: True)
--nfs-server
: NFS server address (default: set to environment variable $INFK8S_NFS_SERVER_IP)
--pvc-name
: Persistent Volume Claim name (default: default_pvc
if present in config.json
)
--dry-run
: Print job YAML without creating it (default: False)
--priority
: Priority class name (default: "default")
- Available options:
default
, batch
, short
--vscode
: Install VS Code CLI in container (default: False)
--tunnel
: Start VS Code SSH tunnel on startup (requires $SLACK_WEBHOOK
and --vscode flag)
--startup-script
: Path to startup script to run in container
Monitor command options:
--namespace
: Kubernetes namespace (default: $KUBE_NAMESPACE)
Monitoring Commands
The kblaunch monitor
command provides several subcommands to monitor cluster resources:
Displays aggregate GPU statistics for the cluster:
uvx kblaunch monitor gpus
Displays queued jobs (jobs which are waiting for GPUs):
uvx kblaunch monitor queue
Displays per-user statistics:
uvx kblaunch monitor users
Displays per-job statistics:
uvx kblaunch monitor jobs
Note that users
and jobs
commands will run nvidia-smi
on pods to obtain GPU usage is not recommended for frequent use.
Features
- Kubernetes job management
- Environment variable handling from multiple sources
- Kubernetes secrets integration
- GPU job support
- Interactive mode
- Automatic job cleanup
- Slack notifications (when configured)
- Persistent Volume Claim (PVC) management
- VS Code integration (with Code tunnelling support)
- Monitoring commands