Raptisv Blog
This article is a guide to setup Nvidia GPU Operator on a Kubernetes cluster running on Azure virtual machine.
VM setup with Nvidia image
Setup a virtual machine on Azure using the NVIDIA GPU-Optimized VMI image. The image contains
- Ubuntu Server OS
- NVIDIA Driver
- Docker-ce
- NVIDIA Container Toolkit
- Azure CLI, NGC CLI
- Miniconda, JupyterLab, Git
Install helm
sudo curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \
&& sudo chmod 700 get_helm.sh \
&& sudo ./get_helm.sh
Install GPU Operator
The image already has the drivers installed and so we set --set driver.enabled=false
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=false
Set containrd as default runtime
sudo nvidia-ctk runtime configure --runtime=containerd
Restart containrd
sudo systemctl restart containerd
Use the GPU from a k8s deployment
Set the node selector to deploy on one of the nodes where a GPU is available
nodeSelector:
nvidia.com/gpu.present: "true"
Set the limits as set below to use the GPU
limits:
nvidia.com/gpu: 1
Extras
Uninstall Nvidia GPU Operator
helm uninstall -n gpu-operator $(helm list -n gpu-operator | grep gpu-operator | awk '{print $1}')