SRv6 Fabric by Sonic Switch
最近看到老东家发布了一篇很有意思的文章:Building an SRv6 uSID Data Center Fabric with SONiC,使用Sonic 构建基于SRv6 Fabric的DC架构。SRv6一般常用于骨干网和DCI,并且需要依托于BGP -> IGP迭代。其实之前在考虑城域网架构时,就思考过是否可以利用SRv6的高级特性,根据业务需求分割两个平面进行独立承载,但设备厂商提供的解决方案都是依托于IGP,而非Only BGP环境。最近阿里云在一些场合发布他们新的Peering架构ePF时提到过SRv6 only BGP,再结合这篇文章,没想到提前在Sonic Switch上先实现了,真的让人惊讶与开源社区的迭代速度。
这篇文章主要尝试下Sonic Switch,记录遇到的问题及解决方法。并测试SRv6实现到了什么程度,是否可以直接替代现有BGP Only的DC架构。PS:全部实验均是通过PNET搭建并验证,Sonic版本是从上面 那篇文章中下载的,基于202305版本,相关配置命令可以看那篇文章提到的github,以及FRR和Sonic的官方文档,为了方便,我列到了下面:
- https://github.com/segmentrouting/srv6-labs/tree/main/sonic-vs
- http://docs.frrouting.org/en/latest/zebra.html?highlight=json
- https://github.com/sonic-net/SONiC/blob/master/doc/srv6/srv6_hld.md
拓扑环境
Ethernet0┌─────────┐Ethernet1
┌─────────┤ L2 ├──────────┐
│ └─────────┘ │
│Ethernet2 │Ethernet2
┌────┴────┐ ┌────┴────┐
│ POD1-L1 │ │ POD2-L1 │
└────┬────┘ └────┬────┘
│Ethernet1 │Ethernet1
│ │
│Ethernet1 │Ethernet1
┌────┴────┐ ┌────┴────┐
│ POD1-L0 │ │ POD2-L0 │
└────┬────┘ └────┬────┘
│Ethernet2 │Ethernet2
│ │
│eth2 │eth0
┌───┴───┐ ┌──┴────┐
│ S1 │ │ S2 │
│ Server│ │ Server│
└───────┘ └───────┘
注意:
由于Sonic自己有Mapping关系,并认为Ethx使用的是物理MAC,Ethernetx是虚拟MAC,mapping关系可以参考这个Link:https://github.com/sonic-net/SONiC/issues/760,如:
Eth1 -> Ethernet1
Eth2 -> Ethernet9
对于PNET来说,Ethernetx是物理Mac。如果在Sonic互联时没有连续使用端口,比如使用Ethernet1和9互联,PNET会认为使用Ethernet9的物理MAC,但Sonic则认为使用是ETH2的物理MAC,这样导致不Match,从而数据不通。如下PNET认为该端口使用的MAC是50:18:59:00:16:09:
[VM Linux S1(50:42:26:00:2c:02)]-vunl44_2-----vnet4_5-----vunl22_9-[(50:18:59:00:16:09)VM Sonic Switch]
root@pnetlab-159:~# ps -ef|grep S1
root 17871 16306 0 01:27 pts/0 00:00:00 grep --color=auto S1
root 28118 1 14 Jul13 ? 02:35:12 /opt/qemu-4.1.0/bin/qemu-system-x86_64 -device virtio-net-pci,netdev=net0,mac=50:42:26:00:2c:00 -netdev tap,id=net0,ifname=vunl44_0,script=no -device virtio-net-pci,netdev=net1,mac=50:42:26:00:2c:01 -netdev tap,id=net1,ifname=vunl44_1,script=no -device virtio-net-pci,netdev=net2,mac=50:42:26:00:2c:02 -netdev tap,id=net2,ifname=vunl44_2,script=no -vnc :24144 -nographic -chardev socket,id=monitor,path=/opt/unetlab/tmp/4/44/monitor.sock,server,nowait -monitor chardev:monitor -smp 2 -m 4096 -name S1 -uuid 47df8578-5257-4f93-aa60-4016af5eaa0d -hda hda.qcow2 -machine type=pc,accel=kvm -vga virtio -usbdevice tablet -boot order=cd -cpu host
root@pnetlab-159:~#
root@pnetlab-159:~# ps -ef|grep POD1-L0
root 19028 16306 0 01:29 pts/0 00:00:00 grep --color=auto POD1-L0
root 23366 1 21 Jul13 ? 04:50:28 /opt/qemu-4.1.0/bin/qemu-system-x86_64 -device virtio-net-pci,netdev=net0,mac=50:18:59:00:16:00 -netdev tap,id=net0,ifname=vunl22_0,script=no -device virtio-net-pci,netdev=net1,mac=50:18:59:00:16:01 -netdev tap,id=net1,ifname=vunl22_1,script=no -device virtio-net-pci,netdev=net2,mac=50:18:59:00:16:02 -netdev tap,id=net2,ifname=vunl22_2,script=no -device virtio-net-pci,netdev=net3,mac=50:18:59:00:16:03 -netdev tap,id=net3,ifname=vunl22_3,script=no -device virtio-net-pci,netdev=net4,mac=50:18:59:00:16:04 -netdev tap,id=net4,ifname=vunl22_4,script=no -device virtio-net-pci,netdev=net5,mac=50:18:59:00:16:05 -netdev tap,id=net5,ifname=vunl22_5,script=no -device virtio-net-pci,netdev=net6,mac=50:18:59:00:16:06 -netdev tap,id=net6,ifname=vunl22_6,script=no -device virtio-net-pci,netdev=net7,mac=50:18:59:00:16:07 -netdev tap,id=net7,ifname=vunl22_7,script=no -device virtio-net-pci,netdev=net8,mac=50:18:59:00:16:08 -netdev tap,id=net8,ifname=vunl22_8,script=no -device virtio-net-pci,netdev=net9,mac=50:18:59:00:16:09 -netdev tap,id=net9,ifname=vunl22_9,script=no -nographic -chardev socket,id=serial0,path=/opt/unetlab/tmp/4/22/console.sock,server,nowait -serial chardev:serial0 -chardev socket,id=monitor,path=/opt/unetlab/tmp/4/22/monitor.sock,server,nowait -monitor chardev:monitor -smp 2 -m 4096 -name POD1-L0 -uuid 8c767eb4-c34c-4db9-888f-9470e8b21197 -drive file=virtioa.qcow2,if=virtio,bus=0,unit=0,cache=none -machine type=pc,accel=kvm -vga std -usbdevice tablet -boot order=cd
root@pnetlab-159:~# brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.024299e5ec09 no
pnet0 8000.3ca82a1f412c no eth0
vunl44_0
pnet1 8000.3ca82a1f412d no eth1
pnet2 8000.3ca82a1f412e no eth2
pnet3 8000.3ca82a1f412f no eth3
vnet4_1 8000.0abf870731d0 no vunl22_1
vunl40_1
vnet4_2 8000.36a076674cf4 no vunl40_2
vunl41_2
vnet4_3 8000.5edcdd6caa0d no vunl42_1
vunl43_1
vnet4_4 8000.0eb934125fe3 no vunl41_1
vunl42_2
vnet4_5 8000.3abe9e57d1e8 no vunl22_9
vunl44_2
vnet4_6 8000.12f855f8c50e no vunl43_9
vunl45_0
而Sonic Switch则认为Eth2使用的MAC是50:18:59:00:16:02:
4: eth2: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 9122 qdisc pfifo_fast state UP group default qlen 1000
link/ether 50:18:59:00:16:02 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5218:59ff:fe00:1602/64 scope link
valid_lft forever preferred_lft forever
因此在使用PNET模拟Sonic Switch时,为了避免问题发生,建议使用连续的端口(1,2,3,4),不要跳着使用端口(1,3,5,7)
Sonic 基础命令
修改默认密码
默认用户名:admin,默认密码:YourPaSsWoRd;
更改原始密码:
admin@p1l0:~$ sudo passwd admin
New password:
Retype new password:
passwd: password updated successfully
修改MAC地址
Sonic Swith的虚拟MAC地址可以根据需求更改,如果只是全局配置,那么所有端口(Ethernetx)共用同一个MAC,如:
admin@p1l1:~$ more /etc/sonic/config_db.json
{
"DEVICE_METADATA": {
"localhost": {
"hwsku": "Force10-S6000",
"platform": "x86_64-kvm_x86_64-r0",
"mac": "52:54:00:74:c1:02",
"hostname": "p1l1",
"type": "LeafRouter",
"bgp_asn": "100",
"docker_routing_config_mode": "split"
admin@p1l1:~$ ip add show Ethernet1
26: Ethernet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc mq state UNKNOWN group default qlen 1000
link/ether 52:54:00:74:c1:02 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5054:ff:fe74:c102/64 scope link
valid_lft forever preferred_lft forever
admin@p1l1:~$ ip add show Ethernet2
27: Ethernet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc mq state UNKNOWN group default qlen 1000
link/ether 52:54:00:74:c1:02 brd ff:ff:ff:ff:ff:ff
inet6 fe80::5054:ff:fe74:c102/64 scope link
valid_lft forever preferred_lft forever
查看IPv6 ND邻居
admin@p1l1:~$ show ndp
Address MacAddress Iface Vlan Status
----------------------- ----------------- --------- ------ --------
fe80::5054:ff:fe74:c101 52:54:00:74:c1:01 Ethernet1 - STALE
fe80::5054:ff:fe74:c101 52:54:00:74:c1:01 eth1 - STALE
fe80::5054:ff:fe74:c103 52:54:00:74:c1:03 eth2 - STALE
fe80::5054:ff:fe74:c103 52:54:00:74:c1:03 Ethernet2 - STALE
Total number of entries 4
清除ndp信息或者端口counter,可以统一使用下面命令:
admin@p2l1:~$ sonic-clear -h
Usage: sonic-clear [OPTIONS] COMMAND [ARGS]...
SONiC command line - 'Clear' command
Options:
-h, -?, --help Show this message and exit.
Commands:
arp Clear IP ARP table
counters Clear counters
dhcp6relay_counters Clear dhcp6relay message counts
dhcp_relay
dropcounters Clear drop counters
fdb Clear FDB table
flowcnt-route Clear all route flow counters
flowcnt-trap Clear trap flow counters
headroom-pool Clear headroom pool WM
ip Clear IP
ipv6 Clear IPv6 information
line Clear preexisting connection to line
macsec Clear MACsec counts.
nat Clear the nat info
ndp Clear IPv6 NDP table
pbh Clear the PBH info
pfccounters Clear pfc counters
priority-group Clear priority_group WM
queue Clear queue WM
queuecounters Clear queue counters
rifcounters Clear RIF counters
tunnelcounters Clear Tunnel counters
Sonic配置相关
下面是物理端口相关配置存放的目录:
admin@sonic:~$ ls -l /etc/sonic/
total 64
-rw-r--r-- 1 root root 41 May 19 17:14 asic_config_checksum
-rw-r--r-- 1 root root 1141 Jul 11 12:37 config_db.json
-rw-r--r-- 1 root root 17144 Jul 11 12:31 config_db.json.bak
-rw-r--r-- 1 root root 1590 May 19 17:17 constants.yml
-rw-r--r-- 1 root root 2471 Jul 11 12:42 copp_cfg.json
-rw------- 1 root root 403 May 19 17:17 core_analyzer.rc.json
-rw-r--r-- 1 root root 0 May 19 17:19 dhcp_relay_reconcile
-rw-r--r-- 1 root root 49 May 19 17:20 fast-reboot_order
drwxr-x--- 1 300 300 4096 Jul 11 07:57 frr
-rw-r--r-- 1 root root 776 May 19 17:20 generated_services.conf
-rw-r--r-- 1 root root 16681 May 19 17:20 init_cfg.json
-rw-r--r-- 1 root root 0 May 19 17:20 macsec_reconcile
-rw-r--r-- 1 root root 47 May 19 17:17 snmp.yml
-rw-r--r-- 1 root root 147 Jul 11 07:46 sonic-environment
-rw-r--r-- 1 root root 7 May 19 17:14 sonic_release
-rw-r--r-- 1 root root 403 May 19 17:14 sonic_version.yml
-rw-r--r-- 1 root root 10 May 19 17:19 swss_dependent
-rw-r--r-- 1 root root 14 May 19 17:17 updategraph.conf
-rw-r--r-- 1 root root 49 May 19 17:20 warm-reboot_order
修改完配置文件后,使用下面命令生效:
admin@sonic:~$ sudo config reload
Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
保存Sonic Switch的配置,不太推荐,通过这个命令,会把很多默认的配置都写入配置文件:
admin@l2:~$ sudo config save -y
Running command: /usr/local/bin/sonic-cfggen -d --print-data > /etc/sonic/config_db.json
进入FRR配置界面
admin@p1l0:~$ vtysh
Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
p1l0#
配置展示
通过下面配置,可以轻松实现Dual Stack,但SRv6的调度能力完全没有发挥出来,具体结论看最后总结;
p1l0-物理端口相关配置
admin@p1l0:~$ more /etc/sonic/config_db.json
{
"DEVICE_METADATA": {
"localhost": {
"hwsku": "Force10-S6000",
"platform": "x86_64-kvm_x86_64-r0",
"mac": "52:54:00:74:c1:01",
"hostname": "p1l0",
"type": "LeafRouter",
"bgp_asn": "101",
"docker_routing_config_mode": "split"
}
},
"LOOPBACK_INTERFACE": {
"Loopback0|10.0.0.10/32": {},
"Loopback0|fc00:0:10::1/128": {}
},
"INTERFACE": {
"Ethernet1": {
"ipv6_use_link_local_only": "enable"
},
"Ethernet2": {},
"Ethernet2|10.101.1.1/24": {},
"Ethernet2|2001:0:101:1::1/64": {}
},
"PORT": {
"Ethernet1": {
"lanes": "25,26,27,28",
"alias": "fortyGigE0/1",
"index": "0",
"speed": "40000",
"admin_status": "up",
"mtu": "9100"
},
"Ethernet2": {
"lanes": "29",
"alias": "GigE0/2",
"index": "1",
"speed": "1000",
"admin_status": "up",
"mtu": "9100"
}
}
}
p1l0-FRR相关配置
admin@p1l0:~$ vtysh
Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
p1l0# config
p1l0(config)#
no route-map RM_SET_SRC6 permit 10
no route-map RM_SET_SRC permit 10
no ip protocol bgp route-map RM_SET_SRC
no ipv6 protocol bgp route-map RM_SET_SRC6
!
route-map BGP-IPV6 permit 20
set ipv6 next-hop prefer-global
!
ipv6 route fc00:0:10::/48 loopback0
!
router bgp 101
bgp router-id 10.0.0.10
bgp log-neighbor-changes
no bgp ebgp-requires-policy
no bgp default ipv4-unicast
bgp bestpath as-path multipath-relax
neighbor Ethernet1 interface remote-as 100
!
segment-routing srv6
locator MAIN
!
address-family ipv4 unicast
network 10.0.0.10/32
network 10.101.1.0/24
neighbor Ethernet1 activate
neighbor Ethernet1 route-map BGP-IPV6 in
exit-address-family
!
address-family ipv6 unicast
network 2001:0:101:1::/64
network fc00:0:10::/48
network fc00:0:10::1/128
neighbor Ethernet1 activate
neighbor Ethernet1 route-map BGP-IPV6 in
maximum-paths 64
exit-address-family
!
segment-routing
srv6
encapsulation
source-address fc00:0:10::1
locators
locator MAIN
behavior usid
prefix fc00:0:10::/48 block-len 32 node-len 16
!
srv6
explicit-sids
sid fc00:0:10:: behavior uN
p1l1-物理端口相关配置
admin@p1l1:~$ more /etc/sonic/config_db.json
{
"DEVICE_METADATA": {
"localhost": {
"hwsku": "Force10-S6000",
"platform": "x86_64-kvm_x86_64-r0",
"mac": "52:54:00:74:c1:02",
"hostname": "p1l1",
"type": "LeafRouter",
"bgp_asn": "100",
"docker_routing_config_mode": "split"
}
},
"LOOPBACK_INTERFACE": {
"Loopback0|10.0.0.11/32": {},
"Loopback0|fc00:0:11::1/128": {}
},
"INTERFACE": {
"Ethernet1": {
"ipv6_use_link_local_only": "enable"
},
"Ethernet2": {
"ipv6_use_link_local_only": "enable"
}
},
"PORT": {
"Ethernet1": {
"lanes": "25,26,27,28",
"alias": "fortyGigE0/1",
"index": "0",
"speed": "40000",
"admin_status": "up",
"mtu": "9100"
},
"Ethernet2": {
"lanes": "29,30,31,32",
"alias": "fortyGigE0/2",
"index": "1",
"speed": "40000",
"admin_status": "up",
"mtu": "9100"
}
}
}
p1l1-FRR相关配置
admin@p1l1:~$ vtysh
Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
p1l1# configure
p1l1(config)#
no route-map RM_SET_SRC6 permit 10
no route-map RM_SET_SRC permit 10
no ip protocol bgp route-map RM_SET_SRC
no ipv6 protocol bgp route-map RM_SET_SRC6
!
route-map BGP-IPV6 permit 20
set ipv6 next-hop prefer-global
!
ipv6 route fc00:0:11::/48 Loopback0
!
router bgp 100
bgp router-id 10.0.0.11
bgp log-neighbor-changes
no bgp ebgp-requires-policy
no bgp default ipv4-unicast
bgp bestpath as-path multipath-relax
neighbor Ethernet1 interface remote-as 101
neighbor Ethernet2 interface remote-as 65001
!
segment-routing srv6
locator MAIN
!
address-family ipv4 unicast
network 10.0.0.11/32
neighbor Ethernet1 activate
neighbor Ethernet1 route-map BGP-IPV6 in
neighbor Ethernet2 activate
neighbor Ethernet2 route-map BGP-IPV6 in
exit-address-family
!
address-family ipv6 unicast
network fc00:0:11::/48
network fc00:0:11::1/128
neighbor Ethernet1 activate
neighbor Ethernet1 route-map BGP-IPV6 in
neighbor Ethernet2 activate
neighbor Ethernet2 route-map BGP-IPV6 in
maximum-paths 64
exit-address-family
!
segment-routing
srv6
encapsulation
source-address fc00:0:11::1
locators
locator MAIN
behavior usid
prefix fc00:0:11::/48 block-len 32 node-len 16
!
srv6
explicit-sids
sid fc00:0:11:: behavior uN
l2-物理端口相关配置
admin@l2:~$ more /etc/sonic/config_db.json
{
"DEVICE_METADATA": {
"localhost": {
"hwsku": "Force10-S6000",
"platform": "x86_64-kvm_x86_64-r0",
"mac": "52:54:00:74:c1:03",
"hostname": "l2",
"type": "SpineRouter",
"bgp_asn": "65001",
"docker_routing_config_mode": "split"
}
},
"LOOPBACK_INTERFACE": {
"Loopback0|10.0.0.2/32": {},
"Loopback0|fc00:0:2::1/128": {}
},
"INTERFACE": {
"Ethernet1": {
"ipv6_use_link_local_only": "enable"
},
"Ethernet2": {
"ipv6_use_link_local_only": "enable"
}
},
"PORT": {
"Ethernet1": {
"lanes": "25,26,27,28",
"alias": "fortyGigE0/1",
"index": "0",
"speed": "40000",
"admin_status": "up",
"mtu": "9100"
},
"Ethernet2": {
"lanes": "29,30,31,32",
"alias": "fortyGigE0/2",
"index": "1",
"speed": "40000",
"admin_status": "up",
"mtu": "9100"
}
}
}
l2-FRR相关配置
admin@l2:~$ vtysh
Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
l2# config
l2(config)#
no route-map RM_SET_SRC6 permit 10
no route-map RM_SET_SRC permit 10
no ip protocol bgp route-map RM_SET_SRC
no ipv6 protocol bgp route-map RM_SET_SRC6
!
route-map BGP-IPV6 permit 20
set ipv6 next-hop prefer-global
!
ipv6 route fc00:0:2::/48 Loopback0
!
router bgp 65001
bgp router-id 10.0.0.2
bgp log-neighbor-changes
no bgp ebgp-requires-policy
no bgp default ipv4-unicast
bgp bestpath as-path multipath-relax
neighbor Ethernet1 interface remote-as 200
neighbor Ethernet2 interface remote-as 100
!
segment-routing srv6
locator MAIN
!
address-family ipv4 unicast
network 10.0.0.2/32
neighbor Ethernet1 activate
neighbor Ethernet1 route-map BGP-IPV6 in
neighbor Ethernet2 activate
neighbor Ethernet2 route-map BGP-IPV6 in
exit-address-family
!
address-family ipv6 unicast
network fc00:0:2::/48
network fc00:0:2::1/128
neighbor Ethernet1 activate
neighbor Ethernet1 route-map BGP-IPV6 in
neighbor Ethernet2 activate
neighbor Ethernet2 route-map BGP-IPV6 in
maximum-paths 64
exit-address-family
!
segment-routing
srv6
encapsulation
source-address fc00:0:2::1
locators
locator MAIN
behavior usid
prefix fc00:0:2::/48 block-len 32 node-len 16
!
srv6
explicit-sids
sid fc00:0:2:: behavior uN
p2l1-物理端口相关配置
admin@p2l1:~$ more /etc/sonic/config_db.json
{
"DEVICE_METADATA": {
"localhost": {
"hwsku": "Force10-S6000",
"platform": "x86_64-kvm_x86_64-r0",
"mac": "52:54:00:74:c1:04",
"hostname": "p2l1",
"type": "LeafRouter",
"bgp_asn": "200",
"docker_routing_config_mode": "split"
}
},
"LOOPBACK_INTERFACE": {
"Loopback0|10.0.0.21/32": {},
"Loopback0|fc00:0:21::1/128": {}
},
"INTERFACE": {
"Ethernet1": {
"ipv6_use_link_local_only": "enable"
},
"Ethernet2": {
"ipv6_use_link_local_only": "enable"
}
},
"PORT": {
"Ethernet1": {
"lanes": "25,26,27,28",
"alias": "fortyGigE0/1",
"index": "0",
"speed": "40000",
"admin_status": "up",
"mtu": "9100"
},
"Ethernet2": {
"lanes": "29,30,31,32",
"alias": "fortyGigE0/2",
"index": "1",
"speed": "40000",
"admin_status": "up",
"mtu": "9100"
}
}
}
p2l1-FRR相关配置
admin@p2l1:~$ vtysh
Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
p2l1# config
p2l1(config)#
no route-map RM_SET_SRC6 permit 10
no route-map RM_SET_SRC permit 10
no ip protocol bgp route-map RM_SET_SRC
no ipv6 protocol bgp route-map RM_SET_SRC6
!
route-map BGP-IPV6 permit 20
set ipv6 next-hop prefer-global
!
ipv6 route fc00:0:21::/48 Loopback0
!
router bgp 200
bgp router-id 10.0.0.21
bgp log-neighbor-changes
no bgp ebgp-requires-policy
no bgp default ipv4-unicast
bgp bestpath as-path multipath-relax
neighbor Ethernet1 interface remote-as 201
neighbor Ethernet2 interface remote-as 65001
!
segment-routing srv6
locator MAIN
!
address-family ipv4 unicast
network 10.0.0.21/32
neighbor Ethernet1 activate
neighbor Ethernet1 route-map BGP-IPV6 in
neighbor Ethernet2 activate
neighbor Ethernet2 route-map BGP-IPV6 in
exit-address-family
!
address-family ipv6 unicast
network fc00:0:21::/48
network fc00:0:21::1/128
neighbor Ethernet1 activate
neighbor Ethernet1 route-map BGP-IPV6 in
neighbor Ethernet2 activate
neighbor Ethernet2 route-map BGP-IPV6 in
maximum-paths 64
exit-address-family
!
segment-routing
srv6
encapsulation
source-address fc00:0:21::1
locators
locator MAIN
behavior usid
prefix fc00:0:21::/48 block-len 32 node-len 16
!
srv6
explicit-sids
sid fc00:0:21:: behavior uN
p2l0-物理端口相关配置
admin@p2l0:~$ more /etc/sonic/config_db.json
{
"DEVICE_METADATA": {
"localhost": {
"hwsku": "Force10-S6000",
"platform": "x86_64-kvm_x86_64-r0",
"mac": "52:54:00:74:c1:05",
"hostname": "p2l0",
"type": "LeafRouter",
"bgp_asn": "201",
"docker_routing_config_mode": "split"
}
},
"LOOPBACK_INTERFACE": {
"Loopback0|10.0.0.20/32": {},
"Loopback0|fc00:0:20::1/128": {}
},
"INTERFACE": {
"Ethernet1": {
"ipv6_use_link_local_only": "enable"
},
"Ethernet2": {},
"Ethernet2|20.101.1.1/24": {},
"Ethernet2|2002:0:101:1::1/64": {}
},
"PORT": {
"Ethernet1": {
"lanes": "25,26,27,28",
"alias": "fortyGigE0/1",
"index": "0",
"speed": "40000",
"admin_status": "up",
"mtu": "9100"
},
"Ethernet2": {
"lanes": "29,30,31,32",
"alias": "GigE0/2",
"index": "1",
"speed": "40000",
"admin_status": "up",
"mtu": "9100"
}
}
}
p2l0-FRR相关配置
admin@p2l0:~$ vtysh
Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
p2l0# config
p2l0(config)#
no route-map RM_SET_SRC6 permit 10
no route-map RM_SET_SRC permit 10
no ip protocol bgp route-map RM_SET_SRC
no ipv6 protocol bgp route-map RM_SET_SRC6
!
route-map BGP-IPV6 permit 20
set ipv6 next-hop prefer-global
!
ipv6 route fc00:0:20::/48 Loopback0
!
router bgp 201
bgp router-id 10.0.0.20
bgp log-neighbor-changes
no bgp ebgp-requires-policy
no bgp default ipv4-unicast
bgp bestpath as-path multipath-relax
neighbor Ethernet1 interface remote-as 200
!
segment-routing srv6
locator MAIN
exit
!
address-family ipv4 unicast
network 10.0.0.20/32
network 20.101.1.0/24
neighbor Ethernet1 activate
neighbor Ethernet1 route-map BGP-IPV6 in
exit-address-family
!
address-family ipv6 unicast
network 2002:0:101:1::/64
network fc00:0:20::/48
network fc00:0:20::1/128
neighbor Ethernet1 activate
neighbor Ethernet1 route-map BGP-IPV6 in
maximum-paths 64
exit-address-family
!
segment-routing
srv6
encapsulation
source-address fc00:0:20::1
locators
locator MAIN
behavior usid
prefix fc00:0:20::/48 block-len 32 node-len 16
!
srv6
explicit-sids
sid fc00:0:20:: behavior uN
Server1 相关配置
root@S1:~# ip addr add 10.101.1.2/24 dev ens5
root@S1:~# ip -6 addr add 2001:0:101:1::2/64 dev ens5
root@S1:~# ip link set ens5 up
root@S1:~# ip route add default via 10.101.1.1
root@S1:~# ip -6 route add default via 2001:0:101:1::1
root@S1:~# ping 10.101.1.1
PING 10.101.1.1 (10.101.1.1) 56(84) bytes of data.
64 bytes from 10.101.1.1: icmp_seq=1 ttl=64 time=0.861 ms
^C
--- 10.101.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.861/0.861/0.861/0.000 ms
root@S1:~# ping6 2001:0:101:1::1
PING 2001:0:101:1::1(2001:0:101:1::1) 56 data bytes
64 bytes from 2001:0:101:1::1: icmp_seq=1 ttl=64 time=1.62 ms
^C
--- 2001:0:101:1::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.615/1.615/1.615/0.000 ms
Server2 相关配置
root@S2:~# ip addr add 20.101.1.2/24 dev ens3
root@S2:~# ip -6 addr add 2002:0:101:1::2/64 dev ens3
root@S2:~# ip link set ens3 up
root@S2:~# ip route add default via 20.101.1.1
root@S2:~# ip -6 route add default via 2002:0:101:1::1
测试验证
我本想在双栈场景下对不同角色的设备配置不同的uN,然后在服务器发送报文时打上uSID,来控制报文的走向,但目前Sonic/FRR不支持针对Global IPv4/IPv6进行调度,VPN场景在现有DCN中又没太大意义,所以就暂时放弃了,这里只展示Sonic对双栈的支持情况。
Server1 IPv4场景
我们看下S1是否可以Ping通S2:
root@S1:~# ip route
default via 10.101.1.1 dev ens5
10.101.1.0/24 dev ens5 proto kernel scope link src 10.101.1.2
root@S1:~# ping 20.101.1.2
PING 20.101.1.2 (20.101.1.2) 56(84) bytes of data.
64 bytes from 20.101.1.2: icmp_seq=1 ttl=59 time=6.00 ms
64 bytes from 20.101.1.2: icmp_seq=2 ttl=59 time=5.42 ms
64 bytes from 20.101.1.2: icmp_seq=3 ttl=59 time=5.74 ms
^C
--- 20.101.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 5.419/5.717/5.997/0.236 ms
我们使用MTR确认下路径,注意这里我发现一个问题,根据上面的配置,除了第一跳和最后一跳可以回显外,中间路径都无法回显,通过抓包,可以看到回复报文使用了docker的地址,这个地址在每个Sonic上是一样的(真实的白盒可能跟虚拟的不一样),这导致报文转发异常,如下:
root@S1:~# mtr -r 20.101.1.2
Start: 2023-08-08T12:42:46+0000
HOST: S1 Loss% Snt Last Avg Best Wrst StDev
1.|-- _gateway 0.0% 10 0.8 0.8 0.6 1.3 0.2
2.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
3.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
4.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
5.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
6.|-- 20.101.1.2 0.0% 10 5.1 5.2 4.9 5.4 0.2
admin@p1l0:~$ show ip int
Interface Master IPv4 address/mask Admin/Oper BGP Neighbor Neighbor IP
----------- -------- ------------------- ------------ -------------- -------------
Ethernet2 10.101.1.1/24 up/up N/A N/A
Loopback0 10.0.0.10/32 up/up N/A N/A
docker0 240.127.1.1/24 up/down N/A N/A
lo 127.0.0.1/16 up/up N/A N/A
admin@l2:~$ show ip int
Interface Master IPv4 address/mask Admin/Oper BGP Neighbor Neighbor IP
----------- -------- ------------------- ------------ -------------- -------------
Loopback0 10.0.0.2/32 up/up N/A N/A
docker0 240.127.1.1/24 up/down N/A N/A
lo 127.0.0.1/16 up/up N/A N/A
通过下面命令(这些命令我还特意删除了-_-…),可以使中间设备正确回复报文,这个命令应该在eBGP互联的DCN场景中没有太大作用,但问题是它确实生效了,目前还不太理解原理?!
route-map RM_SET_SRC permit 10
set src 10.0.0.11
!
route-map RM_SET_SRC6 permit 10
set src fc00:0:11::1
!
ip protocol bgp route-map RM_SET_SRC
ipv6 protocol bgp route-map RM_SET_SRC6
!
root@S1:~# mtr -r 20.101.1.2
Start: 2023-08-08T13:04:54+0000
HOST: S1 Loss% Snt Last Avg Best Wrst StDev
1.|-- _gateway 0.0% 10 0.8 0.8 0.6 1.0 0.1
2.|-- 10.0.0.11 0.0% 10 1.6 1.6 1.5 1.7 0.1
3.|-- 10.0.0.2 0.0% 10 2.5 2.6 2.3 3.3 0.3
4.|-- 10.0.0.21 0.0% 10 3.4 3.4 2.9 3.7 0.2
5.|-- 10.0.0.20 0.0% 10 4.5 4.3 3.6 4.8 0.3
6.|-- 20.101.1.2 0.0% 10 4.9 5.1 4.6 5.5 0.3
Server1 IPv6场景
root@S1:~# ip -6 route
::1 dev lo proto kernel metric 256 pref medium
2001:0:101:1::/64 dev ens5 proto kernel metric 256 pref medium
fe80::/64 dev ens5 proto kernel metric 256 pref medium
default via 2001:0:101:1::1 dev ens5 metric 1024 pref medium
root@S1:~# ping -6 2002:0:101:1::2
PING 2002:0:101:1::2(2002:0:101:1::2) 56 data bytes
64 bytes from 2002:0:101:1::2: icmp_seq=1 ttl=59 time=8.28 ms
64 bytes from 2002:0:101:1::2: icmp_seq=2 ttl=59 time=5.89 ms
64 bytes from 2002:0:101:1::2: icmp_seq=3 ttl=59 time=5.60 ms
^C
--- 2002:0:101:1::2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 5.596/6.586/8.277/1.201 ms
root@S1:~# mtr -6 -r 2002:0:101:1::2
Start: 2023-08-08T13:24:10+0000
HOST: S1 Loss% Snt Last Avg Best Wrst StDev
1.|-- _gateway 0.0% 10 0.9 0.8 0.6 0.9 0.1
2.|-- fc00:0:11::1 0.0% 10 1.9 1.8 1.6 1.9 0.1
3.|-- fc00:0:2::1 0.0% 10 2.9 2.8 2.5 3.4 0.2
4.|-- fc00:0:21::1 0.0% 10 3.8 3.8 3.5 4.5 0.3
5.|-- fc00:0:20::1 0.0% 10 4.7 4.6 4.2 5.1 0.3
6.|-- 2002:0:101:1::2 0.0% 10 5.4 5.4 4.7 7.4 0.8
P1L0 IPv4/IPv6信息
下面是P1L0上BGP的路由信息:
p1l0# show bgp sum
IPv4 Unicast Summary (VRF default):
BGP router identifier 10.0.0.10, local AS number 101 vrf-id 0
BGP table version 11
RIB entries 13, using 2496 bytes of memory
Peers 1, using 723 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
Ethernet1 4 100 36695 36683 0 0 0 03w4d10h 5 7 N/A
Total number of neighbors 1
IPv6 Unicast Summary (VRF default):
BGP router identifier 10.0.0.10, local AS number 101 vrf-id 0
BGP table version 12
RIB entries 14, using 2688 bytes of memory
Peers 1, using 723 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
Ethernet1 4 100 36695 36683 0 0 0 03w4d10h 5 8 N/A
Total number of neighbors 1
p1l0# show bgp ipv4 un
BGP table version is 11, local router ID is 10.0.0.10, vrf id 0
Default local pref 100, local AS 101
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 10.0.0.2/32 Ethernet1 0 100 65001 i
*> 10.0.0.10/32 0.0.0.0 0 32768 i
*> 10.0.0.11/32 Ethernet1 0 0 100 i
*> 10.0.0.20/32 Ethernet1 0 100 65001 200 201 i
*> 10.0.0.21/32 Ethernet1 0 100 65001 200 i
*> 10.101.1.0/24 0.0.0.0 0 32768 i
*> 20.101.1.0/24 Ethernet1 0 100 65001 200 201 i
Displayed 7 routes and 7 total paths
p1l0#
p1l0# show bgp ipv6 un
BGP table version is 12, local router ID is 10.0.0.10, vrf id 0
Default local pref 100, local AS 101
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 2001:0:101:1::/64
:: 0 32768 i
*> 2002:0:101:1::/64
Ethernet1 0 100 65001 200 201 i
*> fc00:0:2::1/128 Ethernet1 0 100 65001 i
*> fc00:0:10::/48 :: 0 32768 i
*> fc00:0:10::1/128 :: 0 32768 i
*> fc00:0:11::1/128 Ethernet1 0 0 100 i
*> fc00:0:20::1/128 Ethernet1 0 100 65001 200 201 i
*> fc00:0:21::1/128 Ethernet1 0 100 65001 200 i
Displayed 8 routes and 8 total paths
为什么可以在只启用IPv6 Linklocal互联的拓扑中传递v4路由以及转发?主要是因为BGP的下一跳直接用linklocal地址迭代了,由于eBGP都是直连互联,抓包其实看到的还是v4原生的,linklocal只是迭代到对应端口上而已,如下:
address-family ipv4 unicast
neighbor Ethernet1 route-map BGP-IPV6 in
!
route-map BGP-IPV6 permit 20
set ipv6 next-hop prefer-global
p1l0# show bgp ipv4 20.101.1.0/24
BGP routing table entry for 20.101.1.0/24, version 11
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
Ethernet1
100 65001 200 201
fe80::5054:ff:fe74:c102 from Ethernet1 (10.0.0.11)
(fe80::5054:ff:fe74:c102) (prefer-global)
Origin IGP, valid, external, best (First path received)
Last update: Fri Jul 14 08:57:22 2023
p1l0# show bgp ipv4 20.101.1.0/24 json
{
"prefix":"20.101.1.0/24",
"version":11,
"advertisedTo":{
"Ethernet1":{
"hostname":"sonic"
}
},
"paths":[
{
"aspath":{
"string":"100 65001 200 201",
"segments":[
{
"type":"as-sequence",
"list":[
100,
65001,
200,
201
]
}
],
"length":4
},
"origin":"IGP",
"valid":true,
"version":11,
"bestpath":{
"overall":true,
"selectionReason":"First path received"
},
"lastUpdate":{
"epoch":1689325042,
"string":"Fri Jul 14 08:57:22 2023\n"
},
"nexthops":[
{
"ip":"fe80::5054:ff:fe74:c102",
"hostname":"sonic",
"afi":"ipv6",
"scope":"global",
"metric":0,
"accessible":true,
"used":true
},
{
"ip":"fe80::5054:ff:fe74:c102",
"hostname":"sonic",
"afi":"ipv6",
"scope":"link-local",
"accessible":true
}
],
"peer":{
"peerId":"fe80::5054:ff:fe74:c102",
"routerId":"10.0.0.11",
"hostname":"sonic",
"interface":"Ethernet1",
"type":"external"
}
}
]
}
测试结论
- Sonic Switch可以通过使用IPv6 linklocal地址直接完成即插即用的部署,极大简化了DCN双栈的配置(现在部分商用设备也支持此种部署),这应该是未来IPv6部署的趋势,但仍需要考虑在这种架构下,原有的运营工具是否可以顺利运行;
- SRv6调度能力只有在VPN的场景中才能使用,无法在Default/Global 场景中使用,但DCN基本不会使用VPN,所以这个是短板,需要解决才能继续往下走。如果后面支持了,可以尝试从host进行调度而替代GW一部分的overlay功能;
- SRv6有一些比较好的特性需要BGP+IGP,如Flexalgo切片技术,看后面业界是否可以让其支持在BGP only场景中,当然还是要看实际业务需求;
- SRv6的Func是可以自定义的,在未来融合的网络中,是否可以利用自定义Func的特性来优化网络传输效率;