SLURM_RAY
👉Full documentation
Description
SlurmRay is a module for effortlessly distributing tasks on a Slurm cluster using the Ray library. SlurmRay was initially designed to work with the Curnagl cluster at the University of Lausanne. However, it should be able to run on any Slurm cluster with a minimum of configuration.
Installation
SlurmRay is designed to run both locally and on a cluster without any modification. This design is intended to allow work to be carried out on a local machine until the script seems to be working. It should then be possible to run it using all the resources of the cluster without having to modify the code.
pip install slurmray
Usage
from slurmray.RayLauncher import RayLauncher
import ray
import torch
def function_inside_function():
with open("slurmray/RayLauncher.py", "r") as f:
return f.read()[0:10]
def example_func(x):
result = (
ray.cluster_resources(),
f"GPU is available : {torch.cuda.is_available()}",
x + 1,
function_inside_function(),
)
return result
launcher = RayLauncher(
project_name="example",
func=example_func,
args={"x": 1},
files=["slurmray/RayLauncher.py"],
modules=[],
node_nbr=1,
use_gpu=True,
memory=8,
max_running_time=5,
runtime_env={"env_vars": {"NCCL_SOCKET_IFNAME": "eno1"}},
server_run=True,
server_ssh="curnagl.dcsr.unil.ch",
server_username="hjamet",
server_password=None,
)
result = launcher()
print(result)
Launcher documentation
The Launcher documentation is available here.