Kubernetes Ubuntu GPU Nvidia Azure

Kubernetes - Nvidia GPU operator on Azure VM

Raptis Evangelos

Mar 16, 2024
223 views

This article is a guide to setup Nvidia GPU Operator on a Kubernetes cluster running on Azure virtual machine.

VM setup with Nvidia image

Setup a virtual machine on Azure using the NVIDIA GPU-Optimized VMI image. The image contains

Ubuntu Server OS
NVIDIA Driver
Docker-ce
NVIDIA Container Toolkit
Azure CLI, NGC CLI
Miniconda, JupyterLab, Git

Install helm

sudo curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \
    && sudo chmod 700 get_helm.sh \
    && sudo ./get_helm.sh

Install GPU Operator

The image already has the drivers installed and so we set --set driver.enabled=false

helm install --wait --generate-name \
    -n gpu-operator --create-namespace \
    nvidia/gpu-operator \
    --set driver.enabled=false

Set containrd as default runtime

sudo nvidia-ctk runtime configure --runtime=containerd

Restart containrd

sudo systemctl restart containerd

Use the GPU from a k8s deployment

Set the node selector to deploy on one of the nodes where a GPU is available

nodeSelector:
    nvidia.com/gpu.present: "true"

Set the limits as set below to use the GPU

limits:
    nvidia.com/gpu: 1

Extras

Uninstall Nvidia GPU Operator

helm uninstall -n gpu-operator $(helm list -n gpu-operator | grep gpu-operator | awk '{print $1}')

Raptisv Blog

Kubernetes - Nvidia GPU operator on Azure VM

Raptis Evangelos

VM setup with Nvidia image

Install helm

Install GPU Operator

Set containrd as default runtime

Restart containrd

Use the GPU from a k8s deployment

Extras

More Stories

Microsoft Orleans - Introduction

Time Series Anomaly Detection - From Graylog to Grafana

Azure App Service mutli-container with Docker Compose and Nginx

Kubernetes - Bare Metal installation on Ubuntu