
Security News
Crates.io Implements Trusted Publishing Support
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
nvgpu
- NVIDIA GPU toolsIt provides information about GPUs and their availability for computation.
Often we want to train a ML model on one of GPUs installed on a multi-GPU
machine. Since TensorFlow allocates all memory, only one such process can
use the GPU at a time. Unfortunately nvidia-smi
provides only a text
interface with information about GPUs. This packages wraps it with an
easier to use CLI and Python interface.
It's a quick and dirty solution calling nvidia-smi
and parsing its output.
We can take one or more GPUs availabile for computation based on relative
memory usage, ie. it is OK with Xorg taking a few MB.
In addition we have a fancy table of GPU with more information taken by python binding to NVML.
For easier monitoring of multiple machines it's possible to deploy agents (that provide the GPU information in JSON over a REST API) and show the aggregated status in a web application.
For a user:
pip install nvgpu
or to the system:
sudo -H pip install nvgpu
Command-line interface:
# grab all available GPUs
CUDA_VISIBLE_DEVICES=$(nvgpu available)
# grab at most available GPU
CUDA_VISIBLE_DEVICES=$(nvgpu available -l 1)
Print pretty colored table of devices, availability, users, processes:
$ nvgpu list
status type util. temp. MHz users since pids cmd
-- -------- ------------------- ------- ------- ----- ------- --------------- ------ --------
0 [ ] GeForce GTX 1070 0 % 44 139
1 [~] GeForce GTX 1080 Ti 0 % 44 139 alice 2 days ago 19028 jupyter
2 [~] GeForce GTX 1080 Ti 0 % 44 139 bob 14 hours ago 8479 jupyter
3 [~] GeForce GTX 1070 46 % 54 1506 bob 7 days ago 20883 train.py
4 [~] GeForce GTX 1070 35 % 64 1480 bob 7 days ago 26228 evaluate.py
5 [!] GeForce GTX 1080 Ti 0 % 44 139 ? 9305
6 [ ] GeForce GTX 1080 Ti 0 % 44 139
Or shortcut:
$ nvl
Python API:
import nvgpu
nvgpu.available_gpus()
# ['0', '2']
nvgpu.gpu_info()
[{'index': '0',
'mem_total': 8119,
'mem_used': 7881,
'mem_used_percent': 97.06860450794433,
'type': 'GeForce GTX 1070',
'uuid': 'GPU-3aa99ee6-4a9f-470e-3798-70aaed942689'},
{'index': '1',
'mem_total': 11178,
'mem_used': 10795,
'mem_used_percent': 96.57362676686348,
'type': 'GeForce GTX 1080 Ti',
'uuid': 'GPU-60410ded-5218-7b06-9c7a-124b77a22447'},
{'index': '2',
'mem_total': 11178,
'mem_used': 10789,
'mem_used_percent': 96.51994990159241,
'type': 'GeForce GTX 1080 Ti',
'uuid': 'GPU-d0a77bd4-cc70-ca82-54d6-4e2018cfdca6'},
...
]
There are multiple nodes. Agents take info from GPU and provide it in JSON via REST API. Master gathers info from other nodes and displays it in a HTML page. Agents can also display their status by default.
FLASK_APP=nvgpu.webapp flask run --host 0.0.0.0 --port 1080
Set agents into a config file. Agent is specified either via a URL to a remote
machine or 'self'
for direct access to local machine. Remove 'self'
if the
machine itself does not have any GPU. Default is AGENTS = ['self']
, so that
agents also display their own status. Set AGENTS = []
to avoid this.
# nvgpu_master.cfg
AGENTS = [
'self', # node01 - master - direct access without using HTTP
'http://node02:1080',
'http://node03:1080',
'http://node04:1080',
]
NVGPU_CLUSTER_CFG=/path/to/nvgpu_master.cfg FLASK_APP=nvgpu.webapp flask run --host 0.0.0.0 --port 1080
Open the master in the web browser: http://node01:1080.
On Ubuntu with systemd
we can install the agents/master as as service to be
ran automatically on system start.
# create an unprivileged system user
sudo useradd -r nvgpu
Copy nvgpu-agent.service to:
sudo vi /etc/systemd/system/nvgpu-agent.service
Set agents to the configuration file for the master:
sudo vi /etc/nvgpu.conf
AGENTS = [
# direct access without using HTTP
'self',
'http://node01:1080',
'http://node02:1080',
'http://node03:1080',
'http://node04:1080',
]
Set up and start the service:
# enable for automatic startup at boot
sudo systemctl enable nvgpu-agent.service
# start
sudo systemctl start nvgpu-agent.service
# check the status
sudo systemctl status nvgpu-agent.service
# check the service
open http://localhost:1080
FAQs
NVIDIA GPU tools
We found that nvgpu demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
Research
/Security News
Undocumented protestware found in 28 npm packages disrupts UI for Russian-language users visiting Russian and Belarusian domains.
Research
/Security News
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.