kubernetes-nmstate 简介
kubernetes-nmstate 通过 Kubernetes API 驱动的声明式节点网络配置。
随着混合云的出现,节点网络设置变得更加具有挑战性。不同的环境有不同的网络要求。 容器网络接口(CNI)标准实现了不同的解决方案,它解决了集群中 Pod 的通讯问题,包括为其设置 IP 和创建路由等。
然而,在所有这些情况下,节点必须在 Pod 被安排之前设置好网络。 在一个动态的、异质的集群中设置网络,具有动态的网络需求,这本身就是一个挑战。
nmstate 这个项目旨在通过 k8s CRD 的方式配置节点上的网络,它可以一定程度上简化网络配置。
官方网站:https://nmstate.io/
项目地址:https://github.com/nmstate/kubernetes-nmstate
部署环境信息
以3个kubernetes节点为例,操作系统使用ubuntu 22.04.2 LTS
root@node40:~# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node40 Ready control-plane 149d v1.29.3 192.168.72.40 <none> Ubuntu 22.04.2 LTS 5.15.0-105-generic containerd://1.7.15
node41 Ready <none> 149d v1.29.3 192.168.72.41 <none> Ubuntu 22.04.2 LTS 5.15.0-76-generic containerd://1.7.15
node42 Ready <none> 149d v1.29.3 192.168.72.42 <none> Ubuntu 22.04.2 LTS 5.15.0-76-generic containerd://1.7.15
root@node40:~#
部署前置要求
nmstate
依赖 NetworkManager , 所以不是所有的 Linux 发行版都支持。并且 NetworkManager 的版本必须 >= 1.20
在所有ubuntu节点上安装network-manager
apt update -y
apt install -y network-manager
可通过下面的方式检查 NetworkManager 的版本:
root@node40:~# /usr/sbin/NetworkManager --version
1.36.6
在 Ubuntu 中引入了 netplan 进行网络配置。因此,要启用 NetworkManager,需要在所有节点配置renderer: NetworkManager
参数:
root@node40:~# vim /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:version: 2renderer: NetworkManager
......
使配置生效
netplan generate
netplan apply
kubernetes-nmstate 部署
安装参考:https://github.com/nmstate/kubernetes-nmstate/releases
首先,安装kubernetes-nmstate operator:
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/nmstate.io_nmstates.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/namespace.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/service_account.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/role.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/role_binding.yaml
kubectl apply -f https://github.com/nmstate/kubernetes-nmstate/releases/download/v0.82.0/operator.yaml
完成后,创建一个NMState
CR,触发部署kubernetes-nmstate 处理程序:
cat <<EOF | kubectl create -f -
apiVersion: nmstate.io/v1
kind: NMState
metadata:name: nmstate
EOF
查看创建的pods
root@node40:~# kubectl -n nmstate get pods
NAME READY STATUS RESTARTS AGE
nmstate-cert-manager-6dc8846667-r7cvd 1/1 Running 0 23m
nmstate-handler-2t2sf 1/1 Running 7 (13m ago) 23m
nmstate-handler-47x9g 1/1 Running 7 (14m ago) 23m
nmstate-handler-hrhzv 1/1 Running 0 6m25s
nmstate-metrics-7f8b8579cd-6wfzv 2/2 Running 0 23m
nmstate-operator-58dc749498-ltnf2 1/1 Running 0 23m
nmstate-webhook-6d55bff68d-czwzx 1/1 Running 0 23m
报告节点状态
Operator定期向 API 服务器报告节点网络接口的状态。这些报告可通过为每个节点创建的NodeNetworkState
对象获得。
列出所有节点的NodeNetworkStates
:
root@node40:~# kubectl get nodenetworkstates
NAME AGE
node40 8m50s
node41 11m
node42 10m
还可以使用短名称nns
来达到相同的效果:
root@node40:~# kubectl get nns
NAME AGE
node40 9m10s
node41 11m
node42 11m
读取特定节点的状态
通过使用-o yaml
您可以获得给定节点的完整网络状态:
root@node40:~# kubectl get nns node40 -o yaml | more
apiVersion: nmstate.io/v1beta1
kind: NodeNetworkState
metadata:creationTimestamp: "2024-09-18T01:05:05Z"generation: 1name: node40ownerReferences:- apiVersion: v1kind: Nodename: node40uid: 95774bad-ad3e-4256-b6a3-144b71a9780cresourceVersion: "5656"uid: c5cd9d8f-d86a-4f97-a853-86209b554b8b
status:currentState:dns-resolver:config:search: []server:- 223.5.5.5- 223.6.6.6running:search: []server:- 223.5.5.5- 223.6.6.6interfaces:- accept-all-mac-addresses: falsebridge:options:group-addr: 01:80:C2:00:00:00group-forward-mask: 0group-fwd-mask: 0hash-max: 4096mac-ageing-time: 300multicast-last-member-count: 2multicast-last-member-interval: 100multicast-membership-interval: 26000multicast-querier: falsemulticast-querier-interval: 25500multicast-query-interval: 12500multicast-query-response-interval: 1000multicast-query-use-ifaddr: falsemulticast-router: automulticast-snooping: truemulticast-startup-query-count: 2multicast-startup-query-interval: 3124stp:enabled: falseforward-delay: 15hello-time: 2max-age: 20priority: 32768vlan-default-pvid: 1vlan-protocol: 802.1qport:- name: veth117637f8stp-hairpin-mode: truestp-path-cost: 2stp-priority: 32- name: veth4381f50estp-hairpin-mode: truestp-path-cost: 2stp-priority: 32- name: veth7b175187stp-hairpin-mode: truestp-path-cost: 2stp-priority: 32- name: veth82b4c0ddstp-hairpin-mode: truestp-path-cost: 2stp-priority: 32- name: veth8c8368c7stp-hairpin-mode: truestp-path-cost: 2stp-priority: 32- name: vethaf20332astp-hairpin-mode: truestp-path-cost: 2stp-priority: 32ethtool:feature:highdma: truerx-gro: truerx-gro-list: falserx-udp-gro-forwarding: falsetx-checksum-ip-generic: truetx-esp-segmentation: truetx-fcoe-segmentation: falsetx-generic-segmentation: truetx-gre-csum-segmentation: truetx-gre-segmentation: truetx-gso-list: truetx-gso-partial: truetx-gso-robust: falsetx-ipxip4-segmentation: truetx-ipxip6-segmentation: truetx-nocache-copy: falsetx-scatter-gather-fraglist: truetx-sctp-segmentation: truetx-tcp-ecn-segmentation: truetx-tcp-mangleid-segmentation: truetx-tcp-segmentation: truetx-tcp6-segmentation: truetx-tunnel-remcsum-segmentation: truetx-udp-segmentation: truetx-udp_tnl-csum-segmentation: truetx-udp_tnl-segmentation: truetx-vlan-hw-insert: truetx-vlan-stag-hw-insert: trueipv4:address:- ip: 100.64.0.1prefix-length: 24enabled: trueipv6:address:- ip: fe80::c82e:90ff:fea3:ed6aprefix-length: 64enabled: truemac-address: CA:2E:90:A3:ED:6Amax-mtu: 65535min-mtu: 68mptcp:address-flags: []mtu: 1450name: cni0state: uptype: linux-bridge
......
正如所看到的,该对象是集群范围的(即不属于命名空间)。它的name
反映了它所代表的节点的名称。
该对象的主要部分位于status.currentState
中。它包含 DNS 配置、主机上观察到的接口列表及其配置以及路由。
对象的最后一个属性是lastSuccessfulUpdateTime
。它保留记录上次成功更新报告的时间戳。由于报告会定期更新,并且在节点不可访问时(例如在网络重新配置期间)不会更新,因此该值可用于评估观察到的状态是否足够新鲜。
策略配置示例
示例演示如下:
- 准备一个3节点集群,该集群具有 kubernetes 主服务接口(IP 为 192.168.72.x 的 ens33)和一个额外的 VLAN1 网络接口ens35。
- 我们将使用 NMState Operator CRD在附加接口上创建一个名为 br1 的桥。
- 我们将创建一个名为br1-ens35的 Multus networkAttachmentDefinition ,与网桥br1关联
- 我们将创建 2 个带有附加接口的 Pod,这些接口可以在附加网络 VLAN1 上看到。
整体架构看起来像这样:
前置条件
- 安装nmstate
- 节点添加一块网卡
- 安装multus-cni插件
节点添加网卡
为node41
和node42
节点添加一块网卡
root@node41:~# ip link show | grep ens
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
7: ens35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
安装multus-cni插件
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/master/deployments/multus-daemonset-thick.yml
查看创建的pods
root@node40:~# kubectl -n kube-system get pods | grep multus
kube-multus-ds-hmd7g 1/1 Running 0 6m43s
kube-multus-ds-p5g8d 1/1 Running 0 6m43s
kube-multus-ds-rzzwf 1/1 Running 0 6m43s
root@node40:~#
创建nmstate策略
为node41
和node42
节点打标签
root@node40:~# kubectl label nodes node41 external-network=true
node/node41 labeled
root@node40:~# kubectl label nodes node42 external-network=true
node/node42 labeled
创建NodeNetworkConfigurationPolicy
策略,该策略在node41
和node42
节点上创建名为br1
的网桥
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:name: br1-ens35
spec:nodeSelector:external-network: "true"desiredState:interfaces:- name: br1description: Linux bridge with ens35 as a porttype: linux-bridgestate: upipv4:dhcp: trueenabled: truebridge:options:stp:enabled: falseport:- name: ens35
应用配置
root@node40:~# kubectl apply -f nncp.yaml
nodenetworkconfigurationpolicy.nmstate.io/br1-ens35 created
查看创建的策略
root@node40:~# kubectl get nncp
NAME STATUS REASON
br1-ens35 Available SuccessfullyConfigured
查看创建的网桥
root@node41:~# ip link show | grep br1
7: ens35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br1 state UP mode DEFAULT group default qlen 1000
8: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
创建NetworkAttachmentDefinition
root@ubuntu:~# cat multus-bridge.yaml
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:name: multus-br1
spec:config: |{"cniVersion": "0.3.1","type": "bridge","bridge": "br1","ipam": {"type": "host-local","subnet": "192.168.72.0/24","rangeStart": "192.168.72.240","rangeEnd": "192.168.72.250"}}
查看NetworkAttachmentDefinition
root@node40:~# kubectl get net-attach-def
NAME AGE
multus-br1 9s
演示应用程序
root@ubuntu:~# cat demo-app.yaml
---
apiVersion: v1
kind: Pod
metadata:name: net-pod1annotations:k8s.v1.cni.cncf.io/networks: multus-br1
spec:containers:- name: netshoot-podimage: nicolaka/netshootimagePullPolicy: IfNotPresentcommand: ["tail"]args: ["-f", "/dev/null"]terminationGracePeriodSeconds: 0
---
apiVersion: v1
kind: Pod
metadata:name: net-pod2annotations:k8s.v1.cni.cncf.io/networks: multus-br1
spec:containers:- name: netshoot-podimage: nicolaka/netshootimagePullPolicy: IfNotPresentcommand: ["tail"]args: ["-f", "/dev/null"]terminationGracePeriodSeconds: 0
应用配置
kubectl apply -f demo-app.yaml
查看创建的两个pod
root@node40:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
net-pod1 1/1 Running 0 44m
net-pod2 1/1 Running 0 44m
查看net-pod1
网卡
root@node40:~# kubectl exec -it net-pod1 -- ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope host valid_lft forever preferred_lft forever
2: eth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 7e:d4:a4:04:a3:58 brd ff:ff:ff:ff:ff:ff link-netnsid 0inet 100.64.1.5/24 brd 100.64.1.255 scope global eth0valid_lft forever preferred_lft foreverinet6 fe80::7cd4:a4ff:fe04:a358/64 scope link valid_lft forever preferred_lft forever
3: net1@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether a2:7a:47:1b:59:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0inet 192.168.72.243/24 brd 192.168.72.255 scope global net1valid_lft forever preferred_lft foreverinet6 fe80::a07a:47ff:fe1b:5904/64 scope link valid_lft forever preferred_lft forever
查看net-pod2
网卡
root@node40:~# kubectl exec -it net-pod2 -- ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope host valid_lft forever preferred_lft forever
2: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 22:cc:41:f9:6b:ad brd ff:ff:ff:ff:ff:ff link-netnsid 0inet 100.64.2.5/24 brd 100.64.2.255 scope global eth0valid_lft forever preferred_lft foreverinet6 fe80::20cc:41ff:fef9:6bad/64 scope link valid_lft forever preferred_lft forever
3: net1@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 1e:07:5b:d3:0e:77 brd ff:ff:ff:ff:ff:ff link-netnsid 0inet 192.168.72.241/24 brd 192.168.72.255 scope global net1valid_lft forever preferred_lft foreverinet6 fe80::1c07:5bff:fed3:e77/64 scope link valid_lft forever preferred_lft forever
测试PING自身IP
root@node40:~# kubectl exec -it net-pod1 -- ping -c 3 -I net1 192.168.72.243
PING 192.168.72.243 (192.168.72.243) from 192.168.72.243 net1: 56(84) bytes of data.
64 bytes from 192.168.72.243: icmp_seq=1 ttl=64 time=0.025 ms
64 bytes from 192.168.72.243: icmp_seq=2 ttl=64 time=0.060 ms
64 bytes from 192.168.72.243: icmp_seq=3 ttl=64 time=0.054 ms--- 192.168.72.243 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2056ms
rtt min/avg/max/mdev = 0.025/0.046/0.060/0.015 ms
root@node40:~#
测试PING net-pod2 IP
root@node40:~# kubectl exec -it net-pod1 -- ping -c 3 -I net1 192.168.72.241
PING 192.168.72.241 (192.168.72.241) from 192.168.72.243 net1: 56(84) bytes of data.
64 bytes from 192.168.72.241: icmp_seq=1 ttl=64 time=0.240 ms
64 bytes from 192.168.72.241: icmp_seq=2 ttl=64 time=0.412 ms
64 bytes from 192.168.72.241: icmp_seq=3 ttl=64 time=0.627 ms--- 192.168.72.241 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2037ms
rtt min/avg/max/mdev = 0.240/0.426/0.627/0.158 ms
测试PING主机IP
root@node40:~# kubectl exec -it net-pod1 -- ping -c 3 -I net1 192.168.72.40
PING 192.168.72.40 (192.168.72.40) from 192.168.72.243 net1: 56(84) bytes of data.
64 bytes from 192.168.72.40: icmp_seq=1 ttl=64 time=0.626 ms
64 bytes from 192.168.72.40: icmp_seq=2 ttl=64 time=0.348 ms
64 bytes from 192.168.72.40: icmp_seq=3 ttl=64 time=0.451 ms--- 192.168.72.40 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2041ms
rtt min/avg/max/mdev = 0.348/0.475/0.626/0.114 ms
root@node40:~#
最终,我们为两个pod附件了net1网卡,并通过br1网桥连接到主机节点网卡上。
最重要的是我们并不需要手动在主机上创建br1网桥,而是使用kubernetes-nmstate基于kubernetes API自动操作的,同样,可以基于此类方法,在主机上自动创建bond网卡,划分VLAN子接口然后分配给pod等。