一、简介
Metrics API,为我们的k8s集群提供了一组基本的指标(资源的cpu和内存),我们可以通过metrics api来对我们的pod开展HPA和VPA操作(主要通过在pod中对cpu和内存的限制实现动态扩展),也可以通过kubectl top的方式,获取k8s中node和pod的cpu及内存使用情况。
node资源分配情况和使用情况:
pod资源分配情况和使用情况:
二、使用和部署
从kubernetes的官网上来看,k8s初始化后默认是没有提供metrics api服务的,如果需要使用该服务,则需要部署metrics-server。
针对metrics-server的部署方式,我们可以直接从官网的超链接中获取,而该项目则是在github的kubernetes项目中:
https://github.com/kubernetes-sigs/metrics-server/tree/master
在该项目中有比较明确的安装步骤,该项目安装比较简单,直接用其yaml进行apply即可(这里我们选择使用高可用模式,高可用模式其实就是多使用了几个replicas):
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/high-availability-1.21+.yaml
我们先把yaml下载下来看看:
apiVersion: v1
kind: ServiceAccount
metadata:labels:k8s-app: metrics-servername: metrics-servernamespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:labels:k8s-app: metrics-serverrbac.authorization.k8s.io/aggregate-to-admin: "true"rbac.authorization.k8s.io/aggregate-to-edit: "true"rbac.authorization.k8s.io/aggregate-to-view: "true"name: system:aggregated-metrics-reader
rules:
- apiGroups:- metrics.k8s.ioresources:- pods- nodesverbs:- get- list- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:labels:k8s-app: metrics-servername: system:metrics-server
rules:
- apiGroups:- ""resources:- nodes/metricsverbs:- get
- apiGroups:- ""resources:- pods- nodesverbs:- get- list- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:labels:k8s-app: metrics-servername: metrics-server-auth-readernamespace: kube-system
roleRef:apiGroup: rbac.authorization.k8s.iokind: Rolename: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccountname: metrics-servernamespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:labels:k8s-app: metrics-servername: metrics-server:system:auth-delegator
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: system:auth-delegator
subjects:
- kind: ServiceAccountname: metrics-servernamespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:labels:k8s-app: metrics-servername: system:metrics-server
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: system:metrics-server
subjects:
- kind: ServiceAccountname: metrics-servernamespace: kube-system
---
apiVersion: v1
kind: Service
metadata:labels:k8s-app: metrics-servername: metrics-servernamespace: kube-system
spec:ports:- name: httpsport: 443protocol: TCPtargetPort: httpsselector:k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:labels:k8s-app: metrics-servername: metrics-servernamespace: kube-system
spec:replicas: 2selector:matchLabels:k8s-app: metrics-serverstrategy:rollingUpdate:maxUnavailable: 1template:metadata:labels:k8s-app: metrics-serverspec:affinity:podAntiAffinity:requiredDuringSchedulingIgnoredDuringExecution:- labelSelector:matchLabels:k8s-app: metrics-servernamespaces:- kube-systemtopologyKey: kubernetes.io/hostnamecontainers:- args:- --cert-dir=/tmp- --secure-port=4443- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname- --kubelet-use-node-status-port- --metric-resolution=15s#新增,使其不验证k8s提供的ca证书- --kubelet-insecure-tls#修改,修改为我们可以下载的镜像,如果有私有仓库也可以用私有仓库的地址#image: registry.k8s.io/metrics-server/metrics-server:v0.6.4image: bitnami/metrics-server:0.6.4imagePullPolicy: IfNotPresentlivenessProbe:failureThreshold: 3httpGet:path: /livezport: httpsscheme: HTTPSperiodSeconds: 10name: metrics-serverports:- containerPort: 4443name: httpsprotocol: TCPreadinessProbe:failureThreshold: 3httpGet:path: /readyzport: httpsscheme: HTTPSinitialDelaySeconds: 20periodSeconds: 10resources:requests:cpu: 100mmemory: 200MisecurityContext:allowPrivilegeEscalation: falsereadOnlyRootFilesystem: truerunAsNonRoot: truerunAsUser: 1000volumeMounts:- mountPath: /tmpname: tmp-dirnodeSelector:kubernetes.io/os: linuxpriorityClassName: system-cluster-criticalserviceAccountName: metrics-servervolumes:- emptyDir: {}name: tmp-dir
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:name: metrics-servernamespace: kube-system
spec:minAvailable: 1selector:matchLabels:k8s-app: metrics-server
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:labels:k8s-app: metrics-servername: v1beta1.metrics.k8s.io
spec:group: metrics.k8s.iogroupPriorityMinimum: 100insecureSkipTLSVerify: trueservice:name: metrics-servernamespace: kube-systemversion: v1beta1versionPriority: 100
这里,我们修改了两个地方:
1) 在deployment中的container的arg参数中加入:- --kubelet-insecure-tls
2) 修改镜像资源
修改镜像资源主要是方便我们可以顺利pull镜像;
加入kubelet-insecure-tls的主要目的是metrics-server不验证k8s提供的ca证书,如果不加该参数,则可能会导致出现以下问题:
pod状态:
查看描述,会出现Readiness探针异常(Readiness probe failed:Http probe failed with statuscode: 500)
查看日志,出现以下错误:
“Failed to scrape node” err=“Get “https://xxx.xxx.xxx.xxx:10250/metrics/resource”: tls: failed to verify certificate: x509: cannot validate certificate for xxx.xxx.xxx.xxx because it doesn’t contain any IP SANs” node=“k8s-slave3”
正常状态: