从零到一:基于 K3s 快速搭建本地化 kubeflow AI 机器学习平台

背景

Kubeflow 是一种开源的 Kubernetes 原生框架,可用于开发、管理和运行机器学习工作负载,支持诸如 PyTorch、TensorFlow 等众多优秀的机器学习框架,本文介绍如何在 Mac 上搭建本地化的 kubeflow 机器学习平台。
在这里插入图片描述

注意:本文以 deyloyKF 发行版作为主要安装对象,本地环境仅适用于开发测试使用,不可用于生产环境!

更多 kubeflow 发行版参考官网介绍:https://www.kubeflow.org/docs/started/installing-kubeflow/

基本环境:

OS:macos 13.1 (amd64)
DockerDesktop:v4.15.0

尽管 K3s 自身需要的资源不多,但是 kubeflow 套件组件众多,需要设置 Docker 的资源分配,避免安装过程中发生 Pod Pending.
Docker 资源建议设置:CPU 8 核,Memory 10G,磁盘 40G
在这里插入图片描述

安装部署步骤

1. 安装依赖的 CLI

brew install bash argocd jq k3d kubectl kustomize

2. 创建 Kubernetes 集群

为了尽可能降低资源消耗,这里使用 K3s 运行本地集群:

k3d cluster create "kubeflow" --image "rancher/k3s:v1.27.10-k3s2"

通过如下命令检查集群是否就绪:

kubectl get -A pods

正常的输出结果类似如下这样:

NAMESPACE     NAME                                     READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-957fdf8bc-cj9l5   1/1     Running     0          2m30s
kube-system   coredns-77ccd57875-xzzz4                 1/1     Running     0          2m30s
kube-system   metrics-server-648b5df564-gwnhq          1/1     Running     0          2m30s
kube-system   helm-install-traefik-crd-49l4k           0/1     Completed   0          2m31s
kube-system   helm-install-traefik-xrjtd               0/1     Completed   2          2m31s
kube-system   svclb-traefik-a79cf0ef-lj4td             2/2     Running     0          89s
kube-system   traefik-768bdcdcdd-mr8z8                 1/1     Running     0          89s

3. 部署 ArgoCD

ArgoCD 是工作流编排工具,可以帮助我们实现 Kubeflow 的自动化部署

git clone -b main https://github.com/deployKF/deployKF.git
cd deployKF/argocd-plugin
chmod +x ./install_argocd.sh
bash ./install_argocd.sh

通过如下命令检查 ArgoCD 是否就绪:

kubectl get pod -n argocd

正常的输出结果类似如下这样:

NAME                                                READY   STATUS    RESTARTS   AGE
argocd-redis-69f8795dbd-7v4nn                       1/1     Running   0          106s
argocd-applicationset-controller-7b9c4dfb77-7gsf2   1/1     Running   0          106s
argocd-notifications-controller-756764ddd5-jw92c    1/1     Running   0          106s
argocd-server-86f64667bc-7nt7d                      1/1     Running   0          105s
argocd-application-controller-0                     1/1     Running   0          105s
argocd-dex-server-9b5c6dccd-2p779                   1/1     Running   0          106s
argocd-repo-server-5b55578f7c-sfzf4                 2/2     Running   0          105s

4. 安装 kubeflow 套件

准备如下文件:deploykf-app-of-apps.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:name: deploykf-app-of-appsnamespace: argocdlabels:app.kubernetes.io/name: deploykf-app-of-appsapp.kubernetes.io/part-of: deploykf
spec:project: "default"source:## source git repo configuration##  - we use the 'deploykf/deploykf' repo so we can read its 'sample-values.yaml'##    file, but you may use any repo (even one with no files)##repoURL: "https://github.com/deployKF/deployKF.git"targetRevision: "v0.1.4"path: "."## plugin configuration##plugin:name: "deploykf"parameters:## the deployKF generator version##  - available versions: https://github.com/deployKF/deployKF/releases##- name: "source_version"string: "0.1.4"## paths to values files within the `repoURL` repository##  - the values in these files are merged, with later files taking precedence##  - we strongly recommend using 'sample-values.yaml' as the base of your values##    so you can easily upgrade to newer versions of deployKF##- name: "values_files"array:- "./sample-values.yaml"## a string containing the contents of a values file##  - this parameter allows defining values without needing to create a file in the repo##  - these values are merged with higher precedence than those defined in `values_files`##- name: "values"string: |#### This demonstrates how you might structure overrides for the 'sample-values.yaml' file.## For a more comprehensive example, see the 'sample-values-overrides.yaml' in the main repo.#### Notes:##  - YAML maps are RECURSIVELY merged across values files##  - YAML lists are REPLACED in their entirety across values files##  - Do NOT include empty/null sections, as this will remove ALL values from that section.##    To include a section without overriding any values, set it to an empty map: `{}`#### --------------------------------------------------------------------------------##                                      argocd## --------------------------------------------------------------------------------argocd:namespace: argocdproject: default## --------------------------------------------------------------------------------##                                    kubernetes## --------------------------------------------------------------------------------kubernetes:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!## --------------------------------------------------------------------------------##                              deploykf-dependencies## --------------------------------------------------------------------------------deploykf_dependencies:## --------------------------------------##             cert-manager## --------------------------------------cert_manager:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!## --------------------------------------##                 istio## --------------------------------------istio:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!## --------------------------------------##                kyverno## --------------------------------------kyverno:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!## --------------------------------------------------------------------------------##                                  deploykf-core## --------------------------------------------------------------------------------deploykf_core:## --------------------------------------##             deploykf-auth## --------------------------------------deploykf_auth:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!## --------------------------------------##        deploykf-istio-gateway## --------------------------------------deploykf_istio_gateway:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!## --------------------------------------##      deploykf-profiles-generator## --------------------------------------deploykf_profiles_generator:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!## --------------------------------------------------------------------------------##                                   deploykf-opt## --------------------------------------------------------------------------------deploykf_opt:## --------------------------------------##            deploykf-minio## --------------------------------------deploykf_minio:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!## --------------------------------------##            deploykf-mysql## --------------------------------------deploykf_mysql:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!## --------------------------------------------------------------------------------##                                  kubeflow-tools## --------------------------------------------------------------------------------kubeflow_tools:## --------------------------------------##                 katib## --------------------------------------katib:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!## --------------------------------------##               notebooks## --------------------------------------notebooks:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!## --------------------------------------##               pipelines## --------------------------------------pipelines:{} # <-- REMOVE THIS, IF YOU INCLUDE VALUES UNDER THIS SECTION!destination:server: "https://kubernetes.default.svc"namespace: "argocd"

执行如下命令,部署工作流:

kubectl apply -f ./deploykf-app-of-apps.yaml

通过 UI 界面查看 ArgoCD 状态:

kubectl port-forward --namespace "argocd" svc/argocd-server 8090:https

浏览器打开 https://localhost:8090/,用户名:admin,密码可通过如下命令获取:

echo $(kubectl -n argocd get secret/argocd-initial-admin-secret \-o jsonpath="{.data.password}" | base64 -d)

在这里插入图片描述
由于程序间存在依赖关系,可以通过如下脚本按序执行 Sync 操作:

git clone -b main https://github.com/deployKF/deployKF.git
cd deployKF/scripts
chmod +x ./sync_argocd_apps.sh
bash ./sync_argocd_apps.sh

该脚本是幂等的,失败后可反复执行直到部署成功,成功部署后的运行中 Pod 列表类似如下这样:

NAMESPACE                 NAME                                                 READY   STATUS    RESTARTS       AGE
argocd                    argocd-redis-69f8795dbd-x5wtv                        1/1     Running   5 (17m ago)    105m
argocd                    argocd-server-86f64667bc-zfm7m                       1/1     Running   4 (17m ago)    73m
argocd                    argocd-repo-server-5b55578f7c-x26zz                  2/2     Running   10 (17m ago)   91m
argocd                    argocd-notifications-controller-756764ddd5-2fqbr     1/1     Running   5 (17m ago)    89m
argocd                    argocd-dex-server-9b5c6dccd-bl86m                    1/1     Running   5 (17m ago)    91m
argocd                    argocd-application-controller-0                      1/1     Running   5 (17m ago)    91m
argocd                    argocd-applicationset-controller-7b9c4dfb77-hph2r    1/1     Running   5 (17m ago)    105m
cert-manager              cert-manager-c688c56f-w4jts                          1/1     Running   5 (17m ago)    109m
cert-manager              trust-manager-78766fd9bd-zd5zf                       1/1     Running   5 (17m ago)    90m
cert-manager              cert-manager-webhook-d45447457-q6cf8                 1/1     Running   6 (17m ago)    109m
cert-manager              cert-manager-cainjector-59d694bcc7-mrcvg             1/1     Running   6 (17m ago)    109m
deploykf-auth             oauth2-proxy-5fd9888b79-tpnrt                        2/2     Running   11 (16m ago)   73m
deploykf-auth             dex-68c8bf56b9-78d5g                                 2/2     Running   8 (17m ago)    73m
deploykf-dashboard        profile-controller-5575767c76-vshp2                  2/2     Running   8 (17m ago)    73m
deploykf-dashboard        kfam-api-75b64c9645-sjfcq                            2/2     Running   10 (17m ago)   98m
deploykf-dashboard        central-dashboard-6b5d9574dc-fmlt4                   2/2     Running   10 (17m ago)   98m
deploykf-istio-gateway    deploykf-gateway-6ddf8947cc-qz55g                    1/1     Running   5 (17m ago)    98m
deploykf-minio            deploykf-minio-568b877668-w2wct                      2/2     Running   5 (17m ago)    52m
deploykf-mysql            deploykf-mysql-0                                     1/1     Running   5 (17m ago)    109m
istio-system              istiod-7b9b6df595-jbztw                              1/1     Running   5 (17m ago)    91m
kube-system               svclb-deploykf-gateway-7f7cba3a-kkskn                3/3     Running   15 (17m ago)   100m
kube-system               metrics-server-648b5df564-gwnhq                      1/1     Running   9 (17m ago)    5h43m
kube-system               local-path-provisioner-957fdf8bc-cj9l5               1/1     Running   7 (17m ago)    5h43m
kube-system               coredns-77ccd57875-xzzz4                             1/1     Running   7 (17m ago)    5h43m
kube-system               traefik-768bdcdcdd-mr8z8                             1/1     Running   7 (17m ago)    5h42m
kube-system               svclb-traefik-a79cf0ef-6ksjm                         2/2     Running   10 (17m ago)   100m
kubeflow                  katib-controller-75858c4ddf-hwvkx                    1/1     Running   8 (17m ago)    95m
kubeflow                  ml-pipeline-ui-68b7f6586d-qtjp5                      2/2     Running   15 (17m ago)   94m
kubeflow                  ml-pipeline-persistenceagent-68bbd65f98-tsnqn        2/2     Running   10 (17m ago)   94m
kubeflow                  katib-ui-d4df8bdb6-2x75p                             2/2     Running   10 (17m ago)   95m
kubeflow                  ml-pipeline-6445d9fb77-dxgv4                         2/2     Running   24 (16m ago)   94m
kubeflow                  admission-webhook-deployment-789dc56fbf-z7cj8        1/1     Running   5 (17m ago)    94m
kubeflow                  metadata-writer-6f95b9588c-fmx4s                     2/2     Running   8 (17m ago)    73m
kubeflow                  notebook-controller-deployment-649cf9b976-vnvwd      2/2     Running   10 (17m ago)   95m
kubeflow                  training-operator-7cf5c66858-jf5sr                   1/1     Running   3 (17m ago)    43m
kubeflow                  tensorboards-web-app-deployment-778466f5f6-dmrks     2/2     Running   2 (17m ago)    43m
kubeflow                  tensorboard-controller-deployment-644f57dd7c-zlxnw   3/3     Running   24 (17m ago)   92m
kubeflow                  ml-pipeline-scheduledworkflow-578475988-kwz27        2/2     Running   10 (17m ago)   94m
kubeflow                  volumes-web-app-deployment-588d46bb75-95g6b          2/2     Running   2 (17m ago)    42m
kubeflow                  ml-pipeline-viewer-crd-6857ccc85c-zl895              2/2     Running   10 (17m ago)   94m
kubeflow                  metadata-grpc-deployment-566d54d578-wwj9n            2/2     Running   23 (16m ago)   94m
kubeflow                  ml-pipeline-visualizationserver-7b45b7fd56-s4pxh     2/2     Running   15 (17m ago)   94m
kubeflow                  cache-server-66d7586749-prmkq                        2/2     Running   10 (17m ago)   94m
kubeflow                  jupyter-web-app-deployment-9c8c779c-hcqvr            2/2     Running   15 (17m ago)   91m
kubeflow                  katib-db-manager-6998f5bdd8-lrs77                    1/1     Running   5 (17m ago)    95m
kubeflow                  metadata-envoy-deployment-b48db5966-542nh            1/1     Running   5 (17m ago)    94m
kubeflow-argo-workflows   argo-workflow-controller-79fc5c6895-2g26t            2/2     Running   10 (17m ago)   98m
kubeflow-argo-workflows   argo-server-6d97fb7649-lsfdw                         2/2     Running   5 (16m ago)    73m
kyverno                   kyverno-cleanup-controller-6cb4d5848-hh8nm           1/1     Running   5 (17m ago)    109m
kyverno                   kyverno-admission-controller-964c74c7d-frknb         1/1     Running   5 (17m ago)    109m
kyverno                   kyverno-background-controller-796f77c79f-nwhrs       1/1     Running   5 (17m ago)    109m
kyverno                   kyverno-reports-controller-6d6d98fc96-z7qjv          1/1     Running   5 (17m ago)    109m
kyverno                   kyverno-admission-controller-964c74c7d-hgtc2         1/1     Running   4 (17m ago)    109m
kyverno                   kyverno-admission-controller-964c74c7d-x744h         1/1     Running   5 (17m ago)    109m
team-1                    ml-pipeline-visualizationserver-677c86b748-nbrr5     2/2     Running   2 (17m ago)    73m
team-1                    ml-pipeline-ui-artifact-7749b4f5f6-ld7kl             2/2     Running   10 (17m ago)   94m
team-1-prod               ml-pipeline-visualizationserver-677c86b748-hqwsh     2/2     Running   2 (17m ago)    73m
team-1-prod               ml-pipeline-ui-artifact-7749b4f5f6-hl6gk             2/2     Running   10 (17m ago)   94m

同步完成后的 ArgoCD 界面(完成 20 个应用同步):
在这里插入图片描述

5. 访问控制台

执行端口转发:

kubectl port-forward \--namespace "deploykf-istio-gateway" \svc/deploykf-gateway 8080:http 8443:https

由于 Istio Gateway 基于 Host Header 区分访问的目标服务,因此需要配置本地 /etc/hosts 文件,追加如下内容:

127.0.0.1 deploykf.example.com
127.0.0.1 argo-server.deploykf.example.com
127.0.0.1 minio-api.deploykf.example.com
127.0.0.1 minio-console.deploykf.example.com

浏览器访问 https://deploykf.example.com:8443/

管理员:用户名 admin@example.com 密码 admin
用户 1: 用户名 user1@example.com 密码 user1
用户 2: 用户名 user2@example.com 密码 user2

在这里插入图片描述

6. 运行 Jupyter

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

更多功能持续探索中…

本文引用

https://www.deploykf.org/guides/local-quickstart/

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/296409.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

标题:探索AI绘画:使用深度学习生成艺术

正文&#xff1a; 随着计算机技术的发展&#xff0c;人工智能在各个领域取得了显著的成果。通过训练深度学习模型&#xff0c;AI可以学习大量的艺术作品&#xff0c;从而生成具有独特风格和创意的新作品。 本文将介绍如何使用Python和TensorFlow实现一个简单的AI绘画程序。 二、…

运算符规则

console.log(null undefined) null和undefined都是原始类型&#xff0c;然后把这两个转换为数字。是0NaN.看规则有一个NaN的话就得到NaN. console.log({} []); 把{}和[]转换为原始类型分别为和[Object Object]。然后特殊情况有字符串&#xff0c;那就拼接字符串返回[Object…

【嵌入式DIY实例】-使用SCT-013 传感器测量交流电流

使用SCT-013 传感器测量交流电流 文章目录 使用SCT-013 传感器测量交流电流1、SCT-013介绍2、硬件准备2、如何计算电气设备消耗的电流3、代码实现SCT-013电流互感器在家用电能表中很常见。 它是一种无需断开电路即可测量导线中电流的组件。在本文中,我们将介绍如何使用 Arduin…

hexo博客7:构建简单的多层安全防御体系

【hexo博客7】构建简单的多层安全防御体系 写在最前面理解全面安全策略的重要性防御常见的网络攻击1. SQL注入攻击2. 文件上传漏洞3. 跨站脚本攻击&#xff08;XSS&#xff09;4. 跨站请求伪造&#xff08;CSRF&#xff09;5. 目录遍历/本地文件包含&#xff08;LFI/RFI&#x…

基础篇3 浅试Python爬虫爬取视频,m3u8标准的切片视频

浅试Python爬取视频 1.页面分析 使用虾米视频在线解析使用方式&#xff1a;https://jx.xmflv.cc/?url目标网站视频链接例如某艺的视频 原视频链接 解析结果: 1.1 F12查看页面结构 我们发现页面内容中什么都没有&#xff0c;video标签中的src路径也不是视频的数据。 1.2 …

异常

1&#xff0e;异常是什么? 程序中可能出现的问题 2&#xff0e;异常体系的最上层父类是谁?异常分为几类? 父类:Exception。 异常分为两类:编译时异常、运行时异常 3&#xff0e;编译时异常和运行时异常的区别? 编译时异常:没有继承RuntimeExcpetion的异常&#xff0c;直接…

开源博客项目Blog .NET Core源码学习(13:App.Hosting项目结构分析-1)

开源博客项目Blog的App.Hosting项目为MVC架构的&#xff0c;主要定义或保存博客网站前台内容显示页面及后台数据管理页面相关的控制器类、页面、js/css/images文件&#xff0c;页面使用基于layui的Razor页面&#xff08;最早学习本项目就是想学习layui的用法&#xff0c;不过最…

数据结构记录

之前记录的数据结构笔记&#xff0c;不过图片显示不了了 数据结构与算法(C版) 1、绪论 1.1、数据结构的研究内容 一般应用步骤&#xff1a;分析问题&#xff0c;提取操作对象&#xff0c;分析操作对象之间的关系&#xff0c;建立数学模型。 1.2、基本概念和术语 数据&…

UE4_普通贴图制作法线Normal材质

UE4 普通贴图制作法线Normal材质 2021-07-02 10:46 导入一张普通贴图&#xff1a; 搜索节点&#xff1a;NormalFromHeightmap 搜索节点&#xff1a;TextureObjectparameter&#xff0c;并修改成导入的普通贴图&#xff0c;连接至HeightMap中 创建参数normal&#xff0c;连接…

软件杯 深度学习YOLO抽烟行为检测 - python opencv

文章目录 1 前言1 课题背景2 实现效果3 Yolov5算法3.1 简介3.2 相关技术 4 数据集处理及实验5 部分核心代码6 最后 1 前言 &#x1f525; 优质竞赛项目系列&#xff0c;今天要分享的是 &#x1f6a9; 基于深度学习YOLO抽烟行为检测 该项目较为新颖&#xff0c;适合作为竞赛课…

反截屏控制技术如何防止信息通过手机拍照泄漏?

反截屏控制技术为企业数据安全提供了重要的防护措施。通过以下几点&#xff0c;有效阻止了信息通过拍照等方式的泄漏&#xff1a; 反截屏控制开启&#xff0c;用户启动截屏操作时&#xff0c;允许非涉密内容截屏操作&#xff0c;但所有涉密内容窗口会自动隐藏&#xff0c;防止涉…

【计算机视觉】四篇基于Gaussian Splatting的SLAM论文对比

本文对比四篇论文&#xff1a; [1] Gaussian Splatting SLAM [2] SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM [3] Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting [4] GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting …

第二十一章 RabbitMQ

一、RabbitMQ 介绍 在介绍 RabbitMQ 之前&#xff0c;我们先来看下面一个电商项目的场景&#xff1a; - 商品的原始数据保存在数据库中&#xff0c;增删改查都在数据库中完成。 - 搜索服务数据来源是索引库&#xff08;Elasticsearch&#xff09;&#xff0c;如果数据库商品…

【VUE+ElementUI】el-table表格固定列el-table__fixed导致滚动条无法拖动

【VUEElementUI】el-table表格固定列el-table__fixed导致滚动条无法拖动 背景 当设置了几个固定列之后&#xff0c;表格无数据时&#xff0c;点击左侧滚动条却被遮挡&#xff0c;原因是el-table__fixed过高导致的 解决 在index.scss中直接加入以下代码即可 /* 设置默认高…

vue快速入门(四)v-html

注释很详细&#xff0c;直接上代码 上一篇 新增内容 使用v-html将文本以html的方式显示 源码 <!DOCTYPE html> <html lang"en"> <head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, …

PS从入门到精通视频各类教程整理全集,包含素材、作业等(7)

PS从入门到精通视频各类教程整理全集&#xff0c;包含素材、作业等 最新PS以及插件合集&#xff0c;可在我以往文章中找到 由于阿里云盘有分享次受限制和文件大小限制&#xff0c;今天先分享到这里&#xff0c;后续持续更新 PS敬伟01——90集等文件 https://www.alipan.com/s…

Vue ElementPlus Input 输入框

Input 输入框 通过鼠标或键盘输入字符 input 为受控组件&#xff0c;它总会显示 Vue 绑定值。 通常情况下&#xff0c;应当处理 input 事件&#xff0c;并更新组件的绑定值&#xff08;或使用v-model&#xff09;。否则&#xff0c;输入框内显示的值将不会改变&#xff0c;不支…

【环境变量】命令行参数 | 概念 | 理解 | 命令行参数表 | bash进程

目录 四组概念 命令行参数概念&理解 查看命令函参数 命令行字符串&命令行参数表 命令行参数存在的意义 谁形成的命令行参数 父进程&子进程&数据段 bash进程 最近有点小忙&#xff0c;可能更新比较慢。 四组概念 竞争性: 系统进程数目众多&#xff0c…

docker------docker入门

&#x1f388;个人主页&#xff1a;靓仔很忙i &#x1f4bb;B 站主页&#xff1a;&#x1f449;B站&#x1f448; &#x1f389;欢迎 &#x1f44d;点赞✍评论⭐收藏 &#x1f917;收录专栏&#xff1a;Linux &#x1f91d;希望本文对您有所裨益&#xff0c;如有不足之处&#…

postgis 建立路径分析,使用arcmap处理路网数据,进行拓扑检查

在postgresql+postgis上面,对路网进行打断化简,提高路径规划成功率。 一、创建空间库以及空间索引 CREATE EXTENSION postgis; CREATE EXTENSION pgrouting; CREATE EXTENSION postgis_topology; CREATE EXTENSION fuzzystrmatch; CREATE EXTENSION postgis_tiger_geocoder;…