设置docker使用gpu
安装nvidia container toolkit
设置运行时为docker
sudo nvidia-ctk runtime configure --runtime=dockerdocker加载配置(不重启容器)
向daemon.json中添加配置 "live-restore": true
加载配置
systemctl restart docker验证
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi安装k3s
k3s的默认容器运行时是containerd,需要手动指定为docker
curl –sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh |K3S_NODE_NAME=master INSTALL_K3S_MIRROR=cn INSTALL_K3S_EXEC="--docker" sh -s - --system-default-registry "registry.cn-hangzhou.aliyuncs.com"在agent节点上安装k3s并加入server
curl –sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh |K3S_NODE_NAME=worker-1 INSTALL_K3S_MIRROR=cn INSTALL_K3S_EXEC="--docker" K3S_URL=https://<master_ip>:<port> K3S_TOKEN=<server_token> sh -s -设置k3s当前用户可用
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
sed -i "s/127.0.0.1/$(hostname -I | awk '{print $1}')/g" ~/.kube/config
echo 'export KUBECONFIG=$HOME/.kube/config' >> ~/.bashrc
source ~/.bashrc配置k3s
设置nvidia设备
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.15.0/deployments/static/nvidia-device-plugin.yml设置共享存储
配置NFS
NFS(Network File System) 是一种分布式文件系统协议,允许不同计算机(或服务器)之间通过网络共享文件和目录。
sudo apt update sudo apt install nfs-common nfs-kernel-server -y sudo mkdir -p srv/nfs # 设置目录权限(允许匿名访问) sudo chown nobody:nogroup srv/nfs sudo chmod 777 srv/nfs echo -e "<your_share_dir> <your_CIDR_ip>(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports # 重新加载 exports 配置 sudo exportfs -ra # 重启 NFS 服务 sudo systemctl restart nfs-kernel-server # 确保服务开机自启 sudo systemctl enable nfs-kernel-server存储节点上执行:
sudo apt update sudo apt install nfs-common -y
创建存储类
存储类(StorageClass) 是一种用于定义动态存储供应(Dynamic Provisioning) 的机制。它抽象了底层存储系统的细节(如云存储、本地磁盘、网络存储等),允许用户按需动态创建持久卷(PersistentVolume, PV),而无需手动预先配置存储资源。
# storage_class.yaml
# 创建存储类
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: <your_class_name>
provisioner: kubernetes.io/no-provisioner # 静态PV不需要provisioner
reclaimPolicy: Retain
volumeBindingMode: Immediatekubectl apply -f storage_class.yaml创建持久卷和持久卷声明
PV (PersistentVolume) 和 PVC (PersistentVolumeClaim) 是管理持久化存储的核心对象,共同为 Pod 提供与生命周期无关的存储资源。它们与 StorageClass 协同工作,形成完整的存储体系。
创建PV与PVC
# pv.yaml apiVersion: v1 kind: PersistentVolume metadata: name: <your_pv_name> spec: capacity: storage: 10Gi volumeMode: Filesystem accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain storageClassName: <your_class_name> nfs: path: <your_share_dir> # 使用新创建的目录 server: <your_server_ip> # 替换为您的 Master 节点 IP readOnly: false# pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: <your_pvc_name> spec: storageClassName: <your_class_name> accessModes: - ReadWriteMany resources: requests: storage: 5Gi volumeName: <your_pv_name>kubectl apply -f nfs-pv.yaml kubectl -n namespace apply -f nfs-pvc.yaml kubectl get pv kubectl -n namespace get pvc
评论区