K8s – CNI Network Component

13 8 月, 2022,由 frank撰写

我看的教程算是2020年初的，所以根据现在的时间，已经过去两年半了，下面的信息不一定很准确，后面如果哪里不正确，随着学习的深入，再更改和说明。CNI的全程是Container Network Interface，主要为了解决Pod资源在不同宿主机之间互通的。

常见CNI组件

所有CNI都部署在node节点；

Flannel，19年的市场占有率38%；
Calico，19年的市场占有率35%；
Canal：结合了flannel和calico；19年的市场占有率5%；
Contiv：思科开源的；
OpenContrail：Juniper开源的；
NSX-T：vmware开源的；
Kube-router：试图取代kube-proxy，但使用的不多；

Flannel

安装部署Flannel

可以从Github上直接下载，目前Link在这里：https://github.com/flannel-io/flannel/releases/download/v0.11.0/flannel-v0.11.0-linux-amd64.tar.gz，但国内环境可能不好，所以需要科学上网或者多尝试几次；分别在node13和node14上部署Flannel：

root@f0-13:/opt/src# mkdir /opt/flannel-v0.11.0
root@f0-13:/opt/src# tar xf flannel-v0.11.0-linux-amd64.tar.gz -C /opt/flannel-v0.11.0/
root@f0-13:/opt/src# ln -s /opt/flannel-v0.11.0/ /opt/flannel
root@f0-13:/opt/src# cd /opt/flannel
root@f0-13:/opt/flannel# ll
total 34444
drwxr-xr-x 2 root root     4096 Aug 13 11:46 ./
drwxr-xr-x 7 root root     4096 Aug 13 11:48 ../
-rw-r--r-- 1 root root     4300 Oct 23  2018 README.md
-rwxr-xr-x 1 root root 35249016 Jan 29  2019 flanneld*
-rwxr-xr-x 1 root root     2139 Oct 23  2018 mk-docker-opts.sh*

root@f0-13:/opt/flannel# mkdir certs
root@f0-13:/opt/flannel# cd certs/
root@f0-13:/opt/flannel/certs# cp /opt/kubernetes/server/bin/certs/client* ./
root@f0-13:/opt/flannel/certs# cp /opt/kubernetes/server/bin/certs/ca.pem ./
root@f0-13:/opt/flannel/certs# ll
total 20
drwxr-xr-x 2 root root 4096 Aug 13 11:50 ./
drwxr-xr-x 3 root root 4096 Aug 13 11:50 ../
-rw-r--r-- 1 root root 1127 Aug 13 11:50 ca.pem
-rw------- 1 root root 1679 Aug 13 11:50 client-key.pem
-rw-r--r-- 1 root root 1261 Aug 13 11:50 client.pem

注意FLANNEL_NETWORK设置的是所有pod的网段，换句话说就是docker的网段，在我这里就是172.1.0.0/16，因为node13用的是172.1.13.x/24，而node14用的是172.1.14.x/24，而FLANNEL_SUBNET则是本机docker的子网：

root@f0-13:/opt/flannel# more subnet.env 
FLANNEL_NETWORK=172.1.0.0/16
FLANNEL_SUBNET=172.1.13.1/24
FLANNEL_MTU=1500
FLANNEL_IPMASQ=false
root@f0-13:/opt/flannel#
root@f0-13:/opt/flannel# chmod +x flanneld.sh
root@f0-13:/opt/flannel# more flanneld.sh 
#!/bin/sh
./flanneld \
  --public-ip=172.20.0.13 \
  --etcd-endpoints=https://172.20.0.12:2379,https://172.20.0.13:2379,https://172.20.0.14:2379 \
  --etcd-keyfile=./certs/client-key.pem \
  --etcd-certfile=./certs/client.pem \
  --etcd-cafile=./certs/ca.pem \
  --iface=ens3 \
  --subnet-file=./subnet.env \
  --healthz-port=2401

root@f0-13:/opt/flannel# mkdir -p /data/logs/flanneld
root@f0-13:/opt/flannel# more /etc/supervisor/conf.d/flannel.conf 
[program:flanneld-0-13]
command=/opt/flannel/flanneld.sh                             ; the program (relative uses PATH, can take args)
numprocs=1                                                   ; number of processes copies to start (def 1)
directory=/opt/flannel                                       ; directory to cwd to before exec (def no cwd)
autostart=true                                               ; start at supervisord start (default: true)
autorestart=true                                             ; retstart at unexpected quit (default: true)
startsecs=30                                                 ; number of secs prog must stay running (def. 1)
startretries=3                                               ; max # of serial start failures (default 3)
exitcodes=0,2                                                ; 'expected' exit codes for process (default 0,2)
stopsignal=QUIT                                              ; signal used to kill process (default TERM)
stopwaitsecs=10                                              ; max num secs to wait b4 SIGKILL (default 10)
user=root                                                    ; setuid to this UNIX account to run the program
redirect_stderr=true                                         ; redirect proc stderr to stdout (default false)
stdout_logfile=/data/logs/flanneld/flanneld.stdout.log       ; stderr log path, NONE for none; default AUTO
stdout_logfile_maxbytes=64MB                                 ; max # logfile bytes b4 rotation (default 50MB)
stdout_logfile_backups=4                                     ; # of stdout logfile backups (default 10)
stdout_capture_maxbytes=1MB                                  ; number of bytes in 'capturemode' (default 0)
stdout_events_enabled=false                                  ; emit events on stdout writes (default false)

设置etcd

增加host-gw模型，flannel还支持vxlan模型；如果下面的地址填错了，可以用“./etcdctl rm /coreos.com/network/config”删除，然后重新set：

root@f0-13:/opt/etcd# ./etcdctl set /coreos.com/network/config '{"Network": "172.1.0.0/16", "Backend": {"Type": "host-gw"}}'
{"Network": "172.1.0.0/16", "Backend": {"Type": "host-gw"}}
root@f0-13:/opt/etcd# ./etcdctl get /coreos.com/network/config
{"Network": "172.1.0.0/16", "Backend": {"Type": "host-gw"}}

验证和总结

root@f0-13:/opt/flannel# supervisorctl update
flanneld-0-13: added process group
root@f0-13:/opt/flannel# supervisorctl status
etcd-server-0-13                 RUNNING   pid 754, uptime 2 days, 5:02:40
flanneld-0-13                    RUNNING   pid 1139448, uptime 0:00:43
kube-apiserver-0-13              RUNNING   pid 755, uptime 2 days, 5:02:40
kube-controller-manager-0-13     RUNNING   pid 840066, uptime 16:09:08
kube-kubelet-0-13                RUNNING   pid 759, uptime 2 days, 5:02:40
kube-proxy-0-13                  RUNNING   pid 761, uptime 2 days, 5:02:40
kube-scheduler-0-13              RUNNING   pid 840058, uptime 16:09:08

subnet.env中的网络配置开始设置错了，导致node13的flannel总是使用一个奇怪的地址，经过查看flannel的启动日志，发现复用了老的IP，而没使用更正后的subnet.env，估计是因为当时网络有问题导致自动分了一个地址，这个地址被写入了etcd。但这要如何修复呢？我尝试重启，重新配置subnet.env均不生效，都被老IP替换了，经过查找，需要把etcd的信息改掉或删掉才可以，我直接删掉了，然后重启flannel恢复正常：

root@f0-13:/opt/flannel# more subnet.env 
FLANNEL_NETWORK=172.1.0.0/16
FLANNEL_SUBNET=172.1.36.1/24
FLANNEL_MTU=1500
FLANNEL_IPMASQ=false

root@f0-13:/opt/flannel# more /data/logs/flanneld/flanneld.stdout.log |grep 172.1.36.0 -C 5
2022-08-13 16:13:28.582351 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I0813 16:13:28.584654     763 main.go:244] Created subnet manager: Etcd Local Manager with Previous Subnet: 172.1.13.0/24
I0813 16:13:28.584670     763 main.go:247] Installing signal handlers
I0813 16:13:28.589980     763 main.go:587] Start healthz server on 0.0.0.0:2401
I0813 16:13:30.265734     763 main.go:386] Found network config - Backend type: host-gw
I0813 16:13:30.290094     763 local_manager.go:147] Found lease (172.1.36.0/24) for current IP (172.20.0.13), reusing
I0813 16:13:30.297810     763 main.go:317] Wrote subnet file to ./subnet.env
I0813 16:13:30.298041     763 main.go:321] Running backend.
I0813 16:13:30.298437     763 route_network.go:53] Watching for new subnet leases
I0813 16:13:30.311451     763 main.go:429] Waiting for 22h59m59.323986135s to renew lease
I0813 16:13:30.318417     763 route_network.go:85] Subnet added: 172.1.14.0/24 via 172.20.0.14

root@f0-13:/opt/etcd# ./etcdctl ls /coreos.com/network/subnets
/coreos.com/network/subnets/172.1.14.0-24
/coreos.com/network/subnets/172.1.36.0-24

root@f0-13:/opt/etcd# ./etcdctl --cert-file=./certs/etcd-peer.pem --key-file=./certs/etcd-peer-key.pem --ca-file=./certs/ca.pem rmdir /coreos.com/network/subnets/172.1.36.0-24
2022-08-13 17:09:41.747978 I | warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
PrevNode.Value: {"PublicIP":"172.20.0.13","BackendType":"host-gw"}

root@f0-13:/opt/etcd# ./etcdctl ls /coreos.com/network/subnets
/coreos.com/network/subnets/172.1.14.0-24

root@f0-13:/opt/flannel# supervisorctl start flanneld-0-13 
flanneld-0-13: started
root@f0-13:/opt/flannel# cd ../etcd
root@f0-13:/opt/etcd# ./etcdctl ls /coreos.com/network/subnets
/coreos.com/network/subnets/172.1.14.0-24
/coreos.com/network/subnets/172.1.13.0-24

修复上面的问题后，再查看路由表，可以看到flannel已经在宿主机上创建了一条正确的静态路由，这样不同宿主机的pods就可以互通了，所以flannel的host-gw模式其实就是添加了一个静态路由而已，没有任何其他东西。另外注意：host-gw模式只能在l2网络中使用，也就是不同宿主机在同网段，如果宿主机在不同网段，那么就需要vxlan模式了：

root@f0-13:~# ip route
default via 172.20.0.1 dev ens3 proto static 
172.1.13.0/24 dev docker0 proto kernel scope link src 172.1.13.1 
172.1.14.0/24 via 172.20.0.14 dev ens3 
172.20.0.0/24 dev ens3 proto kernel scope link src 172.20.0.13 
root@f0-13:~# 
root@f0-13:~# 
root@f0-13:~# kubectl get pods -o wide
NAME             READY   STATUS    RESTARTS   AGE    IP           NODE             NOMINATED NODE   READINESS GATES
nginx-ds-2jnw2   1/1     Running   1          2d6h   172.1.13.2   f0-13.host.com   <none>           <none>
nginx-ds-6n9d8   1/1     Running   1          2d6h   172.1.14.2   f0-14.host.com   <none>           <none>
root@f0-13:~# 
root@f0-13:~# ping 172.1.14.2
PING 172.1.14.2 (172.1.14.2) 56(84) bytes of data.
64 bytes from 172.1.14.2: icmp_seq=1 ttl=63 time=1.53 ms
64 bytes from 172.1.14.2: icmp_seq=2 ttl=63 time=0.247 ms
64 bytes from 172.1.14.2: icmp_seq=3 ttl=63 time=0.298 ms
^C
--- 172.1.14.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2020ms
rtt min/avg/max/mdev = 0.247/0.692/1.533/0.594 ms

Flannel之SNAT规则优化

开启flannel后，两个pod 互相访问时，使用的地址是真实的pod地址么？从结果上看应该是呀，但其实之间经过了NAT，这是因为pod如果访问外网，可以用宿主机的IP进行交换，否则直接用pod的ip访问，报文是回不来的。但问题是如果只是在网内互访，需要NAT么？显然是不需要的，那么如何对其进行优化，使其满足不同场景？

如下实验可以容易的展示报文是经过NAT的，从f0-13的pod上ping f0-14的pod，然后再f0-14上抓包，看下交互报文：

root@f0-13:~# kubectl get pods -o wide
NAME             READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
nginx-ds-5cbw4   1/1     Running   0          42s   172.1.13.2   f0-13.host.com   <none>           <none>
nginx-ds-pgn66   1/1     Running   0          33s   172.1.14.2   f0-14.host.com   <none>           <none>

root@f0-13:~# kubectl exec -ti nginx-ds-5cbw4 /bin/bash
root@nginx-ds-5cbw4:/# ping 172.1.14.2
PING 172.1.14.2 (172.1.14.2): 56 data bytes
64 bytes from 172.1.14.2: icmp_seq=0 ttl=62 time=2.902 ms
64 bytes from 172.1.14.2: icmp_seq=1 ttl=62 time=0.458 ms
64 bytes from 172.1.14.2: icmp_seq=2 ttl=62 time=0.388 ms
64 bytes from 172.1.14.2: icmp_seq=3 ttl=62 time=1.225 ms
^C--- 172.1.14.2 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.388/1.243/2.902/1.012 ms

root@f0-14:~# tcpdump -i ens3 -p icmp -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
19:56:37.308964 IP 172.20.0.13 > 172.1.14.2: ICMP echo request, id 27, seq 0, length 64
19:56:37.310211 IP 172.1.14.2 > 172.20.0.13: ICMP echo reply, id 27, seq 0, length 64
19:56:38.308984 IP 172.20.0.13 > 172.1.14.2: ICMP echo request, id 27, seq 1, length 64
19:56:38.309063 IP 172.1.14.2 > 172.20.0.13: ICMP echo reply, id 27, seq 1, length 64
19:56:39.310095 IP 172.20.0.13 > 172.1.14.2: ICMP echo request, id 27, seq 2, length 64
19:56:39.310171 IP 172.1.14.2 > 172.20.0.13: ICMP echo reply, id 27, seq 2, length 64
19:56:40.311248 IP 172.20.0.13 > 172.1.14.2: ICMP echo request, id 27, seq 3, length 64
19:56:40.312139 IP 172.1.14.2 > 172.20.0.13: ICMP echo reply, id 27, seq 3, length 64
^C
8 packets captured
8 packets received by filter
0 packets dropped by kernel

转换会发生在postrouting中，通过iptables可以查看这部分信息，从中可以找到如下信息：原地址为172.1.13.0/24，出接口docker0，那么转换原地址。也就是这个规则覆盖的场景是两个pod在同一个宿主机下的互访情况，但并不包含不同宿主机之间访问的场景：

root@f0-13:~# iptables-save|grep -i postrouting
:POSTROUTING ACCEPT [68:4135]
:KUBE-POSTROUTING - [0:0]
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.1.13.0/24 ! -o docker0 -j MASQUERADE
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-POSTROUTING -m comment --comment "Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose" -m set --match-set KUBE-LOOP-BACK dst,dst,src -j MASQUERADE
root@f0-13:~#

因此我们需要优化这个SNAT表项，让其覆盖这两种场景。为了更改iptables，需要特定工具重启后可以自动加载修改后的配置，可以安装”iptables-persistent“，可以参考这里：

root@f0-13:~# apt-get install iptables-persistent -y
root@f0-13:~# ll /etc/iptables/
total 16
drwxr-xr-x   2 root root 4096 Aug 13 21:16 ./
drwxr-xr-x 101 root root 4096 Aug 13 21:16 ../
-rw-r--r--   1 root root 3240 Aug 13 21:16 rules.v4
-rw-r--r--   1 root root  183 Aug 13 21:16 rules.v6

优化表项并保持，重启后仍然生效：

root@f0-13:~# iptables -t nat -D POSTROUTING -s 172.1.13.0/24 ! -o docker0 -j MASQUERADE
root@f0-13:~# iptables -t nat -I POSTROUTING -s 172.1.13.0/24 ! -d 172.1.0.0/16 ! -o docker0 -j MASQUERADE
root@f0-13:~# iptables-save > /etc/iptables/rules.v4
root@f0-13:~# iptables-save > /etc/iptables/rules.v6
root@f0-13:~# more /etc/iptables/rules.v4 |grep POSTROUTING
:POSTROUTING ACCEPT [2:120]
:KUBE-POSTROUTING - [0:0]
-A POSTROUTING -s 172.1.13.0/24 ! -d 172.1.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-POSTROUTING -m comment --comment "Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose" -m set --match-set KUBE-LOOP-BACK dst,dst,src -j MASQUERADE

再次测试下，根据下面输出信息，已经显示真实pod的ip了（ps：我重建了pod1，所以pod的ip和name都换了）：

root@f0-13:~# kubectl exec -ti nginx-ds-2q9mj /bin/bash
root@nginx-ds-2q9mj:/# ping 172.1.14.2 -c 2 ]
PING 172.1.14.2 (172.1.14.2): 56 data bytes
64 bytes from 172.1.14.2: icmp_seq=0 ttl=62 time=2.360 ms
64 bytes from 172.1.14.2: icmp_seq=1 ttl=62 time=0.379 ms
--- 172.1.14.2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.379/1.369/2.360/0.991 ms

root@f0-14:~# tcpdump -i ens3 -p icmp -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
21:31:56.147423 IP 172.1.13.3 > 172.1.14.2: ICMP echo request, id 14, seq 0, length 64
21:31:56.148939 IP 172.1.14.2 > 172.1.13.3: ICMP echo reply, id 14, seq 0, length 64
21:31:57.148030 IP 172.1.13.3 > 172.1.14.2: ICMP echo request, id 14, seq 1, length 64
21:31:57.148103 IP 172.1.14.2 > 172.1.13.3: ICMP echo reply, id 14, seq 1, length 64
^C
4 packets captured
6 packets received by filter
0 packets dropped by kernel

Flannel Vxlan

本文出自 Frank's Blog

本文链接：K8s – CNI Network Component
版权声明：本文为原创文章，仅代表个人观点，版权归 Frank Zhao 所有，转载时请注明本文出处及文章链接

Linux, 学习笔记 k8s,Linux

你可以留言,或者trackback 从你的网站

K8s – Core Resource Management Method«

»K8s – CoreDNS

留言哦

返回顶端

『无题』

I persist, therefore I am!!!

Memo

Perseverance sometimes equals genius!
There are only two creatures in the world who can surmount the pyramid:

—–the eagle and the snail.
Useful Link

1. IANA - AFI
2. IANA - SAFI
3. IANA - BGP Ex Comm
4. RPM pbone.net
5. Linux Packages Search
Useful Toots

Linux/http server:

python -m SimpleHTTPServer <port>

python -m http.server <port>

PS: only one thread, if require more threads, can replace by “caddy”.

Win10/http server:

http://www.rejetto.com/hfs/

Win10/md5:

certutil -hashfile .\xxxx MD5
分类
- CCDE (16)
- CCIE SEC (4)
- CCIE SP (55)
- CCIE VOICE (3)
- Classical CASE (26)
- Linux (76)
- Mobile Phone (4)
- Programmer (44)
- Solution (3)
- Uncategorized (6)
- Windows常用技术 (10)
- 似水流年 (18)
- 学习笔记 (191)
- 常用资料 (45)
- 网络技术 (40)
近期文章
其他操作

K8s – CNI Network Component

常见CNI组件

Flannel

安装部署Flannel

设置etcd

验证和总结

Flannel之SNAT规则优化

Flannel Vxlan

留言哦

『无题』

Memo

Useful Link

Useful Toots

分类

近期文章

其他操作