SRv6 Fabric by Sonic Switch

最近看到老东家发布了一篇很有意思的文章:Building an SRv6 uSID Data Center Fabric with SONiC,使用Sonic 构建基于SRv6 Fabric的DC架构。SRv6一般常用于骨干网和DCI,并且需要依托于BGP -> IGP迭代。其实之前在考虑城域网架构时,就思考过是否可以利用SRv6的高级特性,根据业务需求分割两个平面进行独立承载,但设备厂商提供的解决方案都是依托于IGP,而非Only BGP环境。最近阿里云在一些场合发布他们新的Peering架构ePF时提到过SRv6 only BGP,再结合这篇文章,没想到提前在Sonic Switch上先实现了,真的让人惊讶与开源社区的迭代速度。

这篇文章主要尝试下Sonic Switch,记录遇到的问题及解决方法。并测试SRv6实现到了什么程度,是否可以直接替代现有BGP Only的DC架构。PS:全部实验均是通过PNET搭建并验证,Sonic版本是从上面 那篇文章中下载的,基于202305版本,相关配置命令可以看那篇文章提到的github,以及FRR和Sonic的官方文档,为了方便,我列到了下面:

拓扑环境

      Ethernet0┌─────────┐Ethernet1
     ┌─────────┤   L2    ├──────────┐
     │         └─────────┘          │
     │Ethernet2                     │Ethernet2
┌────┴────┐                    ┌────┴────┐
│ POD1-L1 │                    │ POD2-L1 │
└────┬────┘                    └────┬────┘
     │Ethernet1                     │Ethernet1
     │                              │
     │Ethernet1                     │Ethernet1
┌────┴────┐                    ┌────┴────┐
│ POD1-L0 │                    │ POD2-L0 │
└────┬────┘                    └────┬────┘
     │Ethernet2                     │Ethernet2
     │                              │
     │eth2                          │eth0
 ┌───┴───┐                       ┌──┴────┐
 │  S1   │                       │  S2   │
 │ Server│                       │ Server│
 └───────┘                       └───────┘

注意:

由于Sonic自己有Mapping关系,并认为Ethx使用的是物理MAC,Ethernetx是虚拟MAC,mapping关系可以参考这个Link:https://github.com/sonic-net/SONiC/issues/760,如:

Eth1 -> Ethernet1
Eth2 -> Ethernet9

对于PNET来说,Ethernetx是物理Mac。如果在Sonic互联时没有连续使用端口,比如使用Ethernet1和9互联,PNET会认为使用Ethernet9的物理MAC,但Sonic则认为使用是ETH2的物理MAC,这样导致不Match,从而数据不通。如下PNET认为该端口使用的MAC是50:18:59:00:16:09

[VM Linux S1(50:42:26:00:2c:02)]-vunl44_2-----vnet4_5-----vunl22_9-[(50:18:59:00:16:09)VM Sonic Switch]

root@pnetlab-159:~# ps -ef|grep S1
root     17871 16306  0 01:27 pts/0    00:00:00 grep --color=auto S1
root     28118     1 14 Jul13 ?        02:35:12 /opt/qemu-4.1.0/bin/qemu-system-x86_64 -device virtio-net-pci,netdev=net0,mac=50:42:26:00:2c:00 -netdev tap,id=net0,ifname=vunl44_0,script=no -device virtio-net-pci,netdev=net1,mac=50:42:26:00:2c:01 -netdev tap,id=net1,ifname=vunl44_1,script=no -device virtio-net-pci,netdev=net2,mac=50:42:26:00:2c:02 -netdev tap,id=net2,ifname=vunl44_2,script=no -vnc :24144 -nographic -chardev socket,id=monitor,path=/opt/unetlab/tmp/4/44/monitor.sock,server,nowait -monitor chardev:monitor -smp 2 -m 4096 -name S1 -uuid 47df8578-5257-4f93-aa60-4016af5eaa0d -hda hda.qcow2 -machine type=pc,accel=kvm -vga virtio -usbdevice tablet -boot order=cd -cpu host
root@pnetlab-159:~# 
root@pnetlab-159:~# ps -ef|grep POD1-L0
root     19028 16306  0 01:29 pts/0    00:00:00 grep --color=auto POD1-L0
root     23366     1 21 Jul13 ?        04:50:28 /opt/qemu-4.1.0/bin/qemu-system-x86_64 -device virtio-net-pci,netdev=net0,mac=50:18:59:00:16:00 -netdev tap,id=net0,ifname=vunl22_0,script=no -device virtio-net-pci,netdev=net1,mac=50:18:59:00:16:01 -netdev tap,id=net1,ifname=vunl22_1,script=no -device virtio-net-pci,netdev=net2,mac=50:18:59:00:16:02 -netdev tap,id=net2,ifname=vunl22_2,script=no -device virtio-net-pci,netdev=net3,mac=50:18:59:00:16:03 -netdev tap,id=net3,ifname=vunl22_3,script=no -device virtio-net-pci,netdev=net4,mac=50:18:59:00:16:04 -netdev tap,id=net4,ifname=vunl22_4,script=no -device virtio-net-pci,netdev=net5,mac=50:18:59:00:16:05 -netdev tap,id=net5,ifname=vunl22_5,script=no -device virtio-net-pci,netdev=net6,mac=50:18:59:00:16:06 -netdev tap,id=net6,ifname=vunl22_6,script=no -device virtio-net-pci,netdev=net7,mac=50:18:59:00:16:07 -netdev tap,id=net7,ifname=vunl22_7,script=no -device virtio-net-pci,netdev=net8,mac=50:18:59:00:16:08 -netdev tap,id=net8,ifname=vunl22_8,script=no -device virtio-net-pci,netdev=net9,mac=50:18:59:00:16:09 -netdev tap,id=net9,ifname=vunl22_9,script=no -nographic -chardev socket,id=serial0,path=/opt/unetlab/tmp/4/22/console.sock,server,nowait -serial chardev:serial0 -chardev socket,id=monitor,path=/opt/unetlab/tmp/4/22/monitor.sock,server,nowait -monitor chardev:monitor -smp 2 -m 4096 -name POD1-L0 -uuid 8c767eb4-c34c-4db9-888f-9470e8b21197 -drive file=virtioa.qcow2,if=virtio,bus=0,unit=0,cache=none -machine type=pc,accel=kvm -vga std -usbdevice tablet -boot order=cd

root@pnetlab-159:~# brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.024299e5ec09       no
pnet0           8000.3ca82a1f412c       no              eth0
                                                        vunl44_0
pnet1           8000.3ca82a1f412d       no              eth1
pnet2           8000.3ca82a1f412e       no              eth2
pnet3           8000.3ca82a1f412f       no              eth3
vnet4_1         8000.0abf870731d0       no              vunl22_1
                                                        vunl40_1
vnet4_2         8000.36a076674cf4       no              vunl40_2
                                                        vunl41_2
vnet4_3         8000.5edcdd6caa0d       no              vunl42_1
                                                        vunl43_1
vnet4_4         8000.0eb934125fe3       no              vunl41_1
                                                        vunl42_2
vnet4_5         8000.3abe9e57d1e8       no              vunl22_9
                                                        vunl44_2
vnet4_6         8000.12f855f8c50e       no              vunl43_9
                                                        vunl45_0

而Sonic Switch则认为Eth2使用的MAC是50:18:59:00:16:02

4: eth2: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 9122 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 50:18:59:00:16:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5218:59ff:fe00:1602/64 scope link 
       valid_lft forever preferred_lft forever

因此在使用PNET模拟Sonic Switch时,为了避免问题发生,建议使用连续的端口(1,2,3,4),不要跳着使用端口(1,3,5,7)

Sonic 基础命令

修改默认密码

默认用户名:admin,默认密码:YourPaSsWoRd

更改原始密码:

admin@p1l0:~$ sudo passwd admin
New password: 
Retype new password: 
passwd: password updated successfully

修改MAC地址

Sonic Swith的虚拟MAC地址可以根据需求更改,如果只是全局配置,那么所有端口(Ethernetx)共用同一个MAC,如:

admin@p1l1:~$ more /etc/sonic/config_db.json 
{
    "DEVICE_METADATA": {
        "localhost": {
            "hwsku": "Force10-S6000",
            "platform": "x86_64-kvm_x86_64-r0",
            "mac": "52:54:00:74:c1:02",
            "hostname": "p1l1",
            "type": "LeafRouter",
            "bgp_asn": "100",
            "docker_routing_config_mode": "split"
admin@p1l1:~$ ip add show Ethernet1
26: Ethernet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc mq state UNKNOWN group default qlen 1000
    link/ether 52:54:00:74:c1:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe74:c102/64 scope link 
       valid_lft forever preferred_lft forever
admin@p1l1:~$ ip add show Ethernet2
27: Ethernet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc mq state UNKNOWN group default qlen 1000
    link/ether 52:54:00:74:c1:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe74:c102/64 scope link 
       valid_lft forever preferred_lft forever

查看IPv6 ND邻居

admin@p1l1:~$ show ndp
Address                  MacAddress         Iface      Vlan    Status
-----------------------  -----------------  ---------  ------  --------
fe80::5054:ff:fe74:c101  52:54:00:74:c1:01  Ethernet1  -       STALE
fe80::5054:ff:fe74:c101  52:54:00:74:c1:01  eth1       -       STALE
fe80::5054:ff:fe74:c103  52:54:00:74:c1:03  eth2       -       STALE
fe80::5054:ff:fe74:c103  52:54:00:74:c1:03  Ethernet2  -       STALE
Total number of entries 4 

清除ndp信息或者端口counter,可以统一使用下面命令:

admin@p2l1:~$ sonic-clear -h
Usage: sonic-clear [OPTIONS] COMMAND [ARGS]...

  SONiC command line - 'Clear' command

Options:
  -h, -?, --help  Show this message and exit.

Commands:
  arp                  Clear IP ARP table
  counters             Clear counters
  dhcp6relay_counters  Clear dhcp6relay message counts
  dhcp_relay
  dropcounters         Clear drop counters
  fdb                  Clear FDB table
  flowcnt-route        Clear all route flow counters
  flowcnt-trap         Clear trap flow counters
  headroom-pool        Clear headroom pool WM
  ip                   Clear IP
  ipv6                 Clear IPv6 information
  line                 Clear preexisting connection to line
  macsec               Clear MACsec counts.
  nat                  Clear the nat info
  ndp                  Clear IPv6 NDP table
  pbh                  Clear the PBH info
  pfccounters          Clear pfc counters
  priority-group       Clear priority_group WM
  queue                Clear queue WM
  queuecounters        Clear queue counters
  rifcounters          Clear RIF counters
  tunnelcounters       Clear Tunnel counters

Sonic配置相关

下面是物理端口相关配置存放的目录:

admin@sonic:~$ ls -l /etc/sonic/
total 64
-rw-r--r-- 1 root root    41 May 19 17:14 asic_config_checksum
-rw-r--r-- 1 root root  1141 Jul 11 12:37 config_db.json
-rw-r--r-- 1 root root 17144 Jul 11 12:31 config_db.json.bak
-rw-r--r-- 1 root root  1590 May 19 17:17 constants.yml
-rw-r--r-- 1 root root  2471 Jul 11 12:42 copp_cfg.json
-rw------- 1 root root   403 May 19 17:17 core_analyzer.rc.json
-rw-r--r-- 1 root root     0 May 19 17:19 dhcp_relay_reconcile
-rw-r--r-- 1 root root    49 May 19 17:20 fast-reboot_order
drwxr-x--- 1  300  300  4096 Jul 11 07:57 frr
-rw-r--r-- 1 root root   776 May 19 17:20 generated_services.conf
-rw-r--r-- 1 root root 16681 May 19 17:20 init_cfg.json
-rw-r--r-- 1 root root     0 May 19 17:20 macsec_reconcile
-rw-r--r-- 1 root root    47 May 19 17:17 snmp.yml
-rw-r--r-- 1 root root   147 Jul 11 07:46 sonic-environment
-rw-r--r-- 1 root root     7 May 19 17:14 sonic_release
-rw-r--r-- 1 root root   403 May 19 17:14 sonic_version.yml
-rw-r--r-- 1 root root    10 May 19 17:19 swss_dependent
-rw-r--r-- 1 root root    14 May 19 17:17 updategraph.conf
-rw-r--r-- 1 root root    49 May 19 17:20 warm-reboot_order

修改完配置文件后,使用下面命令生效:

admin@sonic:~$ sudo config reload
Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen  -j /etc/sonic/init_cfg.json  -j /etc/sonic/config_db.json  --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon

保存Sonic Switch的配置,不太推荐,通过这个命令,会把很多默认的配置都写入配置文件:

admin@l2:~$ sudo config save -y
Running command: /usr/local/bin/sonic-cfggen -d --print-data > /etc/sonic/config_db.json

进入FRR配置界面

admin@p1l0:~$ vtysh 

Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

p1l0# 

配置展示

通过下面配置,可以轻松实现Dual Stack,但SRv6的调度能力完全没有发挥出来,具体结论看最后总结;

p1l0-物理端口相关配置

admin@p1l0:~$ more /etc/sonic/config_db.json 
{
    "DEVICE_METADATA": {
        "localhost": {
            "hwsku": "Force10-S6000",
            "platform": "x86_64-kvm_x86_64-r0",
            "mac": "52:54:00:74:c1:01",
            "hostname": "p1l0",
            "type": "LeafRouter",
            "bgp_asn": "101",
            "docker_routing_config_mode": "split"
        }
    },
    "LOOPBACK_INTERFACE": {
        "Loopback0|10.0.0.10/32": {},
        "Loopback0|fc00:0:10::1/128": {}
    },

    "INTERFACE": {
        "Ethernet1": {
            "ipv6_use_link_local_only": "enable"
        },
        "Ethernet2": {},
        "Ethernet2|10.101.1.1/24": {},
        "Ethernet2|2001:0:101:1::1/64": {}
    },
    
    "PORT": {
        "Ethernet1": {
            "lanes": "25,26,27,28",
            "alias": "fortyGigE0/1",
            "index": "0",
            "speed": "40000",
            "admin_status": "up",
            "mtu": "9100"
        },
        "Ethernet2": {
            "lanes": "29",
            "alias": "GigE0/2",
            "index": "1",
            "speed": "1000",
            "admin_status": "up",
            "mtu": "9100"
        }
    }
}

p1l0-FRR相关配置

admin@p1l0:~$ vtysh 

Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

p1l0# config
p1l0(config)# 

no route-map RM_SET_SRC6 permit 10
no route-map RM_SET_SRC permit 10
no ip protocol bgp route-map RM_SET_SRC
no ipv6 protocol bgp route-map RM_SET_SRC6
!
route-map BGP-IPV6 permit 20
 set ipv6 next-hop prefer-global
!
ipv6 route fc00:0:10::/48 loopback0
!
router bgp 101
 bgp router-id 10.0.0.10
 bgp log-neighbor-changes
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 bgp bestpath as-path multipath-relax
 neighbor Ethernet1 interface remote-as 100
 !
 segment-routing srv6
  locator MAIN
 !
 address-family ipv4 unicast
  network 10.0.0.10/32
  network 10.101.1.0/24
  neighbor Ethernet1 activate
  neighbor Ethernet1 route-map BGP-IPV6 in
 exit-address-family
 !
 address-family ipv6 unicast
  network 2001:0:101:1::/64
  network fc00:0:10::/48
  network fc00:0:10::1/128
  neighbor Ethernet1 activate
  neighbor Ethernet1 route-map BGP-IPV6 in
  maximum-paths 64
 exit-address-family
!
segment-routing
 srv6
  encapsulation
   source-address fc00:0:10::1
  locators
   locator MAIN
    behavior usid
    prefix fc00:0:10::/48 block-len 32 node-len 16
 !
 srv6
  explicit-sids
   sid fc00:0:10:: behavior uN

p1l1-物理端口相关配置

admin@p1l1:~$ more /etc/sonic/config_db.json 
{
    "DEVICE_METADATA": {
        "localhost": {
            "hwsku": "Force10-S6000",
            "platform": "x86_64-kvm_x86_64-r0",
            "mac": "52:54:00:74:c1:02",
            "hostname": "p1l1",
            "type": "LeafRouter",
            "bgp_asn": "100",
            "docker_routing_config_mode": "split"
        }
    },
    "LOOPBACK_INTERFACE": {
        "Loopback0|10.0.0.11/32": {},
        "Loopback0|fc00:0:11::1/128": {}
    },

    "INTERFACE": {
        "Ethernet1": {
            "ipv6_use_link_local_only": "enable"
        },
        "Ethernet2": {
            "ipv6_use_link_local_only": "enable"
        }
    },

    "PORT": {
        "Ethernet1": {
            "lanes": "25,26,27,28",
            "alias": "fortyGigE0/1",
            "index": "0",
            "speed": "40000",
            "admin_status": "up",
            "mtu": "9100"
        },
        "Ethernet2": {
            "lanes": "29,30,31,32",
            "alias": "fortyGigE0/2",
            "index": "1",
            "speed": "40000",
            "admin_status": "up",
            "mtu": "9100"
        }
    }
}

p1l1-FRR相关配置

admin@p1l1:~$ vtysh 

Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

p1l1# configure 
p1l1(config)# 

no route-map RM_SET_SRC6 permit 10
no route-map RM_SET_SRC permit 10
no ip protocol bgp route-map RM_SET_SRC
no ipv6 protocol bgp route-map RM_SET_SRC6
!
route-map BGP-IPV6 permit 20
 set ipv6 next-hop prefer-global
!
ipv6 route fc00:0:11::/48 Loopback0
!
router bgp 100
 bgp router-id 10.0.0.11
 bgp log-neighbor-changes
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 bgp bestpath as-path multipath-relax
 neighbor Ethernet1 interface remote-as 101
 neighbor Ethernet2 interface remote-as 65001
 !
 segment-routing srv6
  locator MAIN
 !
 address-family ipv4 unicast
  network 10.0.0.11/32
  neighbor Ethernet1 activate
  neighbor Ethernet1 route-map BGP-IPV6 in
  neighbor Ethernet2 activate
  neighbor Ethernet2 route-map BGP-IPV6 in
 exit-address-family
 !
 address-family ipv6 unicast
  network fc00:0:11::/48
  network fc00:0:11::1/128
  neighbor Ethernet1 activate
  neighbor Ethernet1 route-map BGP-IPV6 in
  neighbor Ethernet2 activate
  neighbor Ethernet2 route-map BGP-IPV6 in
  maximum-paths 64
 exit-address-family
!
segment-routing
 srv6
  encapsulation
   source-address fc00:0:11::1
  locators
   locator MAIN
    behavior usid
    prefix fc00:0:11::/48 block-len 32 node-len 16
 !
 srv6
  explicit-sids
   sid fc00:0:11:: behavior uN

l2-物理端口相关配置

admin@l2:~$ more /etc/sonic/config_db.json 
{
    "DEVICE_METADATA": {
        "localhost": {
            "hwsku": "Force10-S6000",
            "platform": "x86_64-kvm_x86_64-r0",
            "mac": "52:54:00:74:c1:03",
            "hostname": "l2",
            "type": "SpineRouter",
            "bgp_asn": "65001",
            "docker_routing_config_mode": "split"
        }
    },
    "LOOPBACK_INTERFACE": {
        "Loopback0|10.0.0.2/32": {},
        "Loopback0|fc00:0:2::1/128": {}
    },

    "INTERFACE": {
        "Ethernet1": {
            "ipv6_use_link_local_only": "enable"
        },
        "Ethernet2": {
            "ipv6_use_link_local_only": "enable"
        }
    },

    "PORT": {
        "Ethernet1": {
            "lanes": "25,26,27,28",
            "alias": "fortyGigE0/1",
            "index": "0",
            "speed": "40000",
            "admin_status": "up",
            "mtu": "9100"
        },
        "Ethernet2": {
            "lanes": "29,30,31,32",
            "alias": "fortyGigE0/2",
            "index": "1",
            "speed": "40000",
            "admin_status": "up",
            "mtu": "9100"
        }
    }
}

l2-FRR相关配置

admin@l2:~$ vtysh 

Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

l2# config
l2(config)#

no route-map RM_SET_SRC6 permit 10
no route-map RM_SET_SRC permit 10
no ip protocol bgp route-map RM_SET_SRC
no ipv6 protocol bgp route-map RM_SET_SRC6
!
route-map BGP-IPV6 permit 20
 set ipv6 next-hop prefer-global
!
ipv6 route fc00:0:2::/48 Loopback0
!
router bgp 65001
 bgp router-id 10.0.0.2
 bgp log-neighbor-changes
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 bgp bestpath as-path multipath-relax
 neighbor Ethernet1 interface remote-as 200
 neighbor Ethernet2 interface remote-as 100
 !
 segment-routing srv6
  locator MAIN
 !
 address-family ipv4 unicast
  network 10.0.0.2/32
  neighbor Ethernet1 activate
  neighbor Ethernet1 route-map BGP-IPV6 in
  neighbor Ethernet2 activate
  neighbor Ethernet2 route-map BGP-IPV6 in
 exit-address-family
 !
 address-family ipv6 unicast
  network fc00:0:2::/48
  network fc00:0:2::1/128
  neighbor Ethernet1 activate
  neighbor Ethernet1 route-map BGP-IPV6 in
  neighbor Ethernet2 activate
  neighbor Ethernet2 route-map BGP-IPV6 in
  maximum-paths 64
 exit-address-family
!
segment-routing
 srv6
  encapsulation
   source-address fc00:0:2::1
  locators
   locator MAIN
    behavior usid
    prefix fc00:0:2::/48 block-len 32 node-len 16
 !
 srv6
  explicit-sids
   sid fc00:0:2:: behavior uN

p2l1-物理端口相关配置

admin@p2l1:~$ more /etc/sonic/config_db.json 
{
    "DEVICE_METADATA": {
        "localhost": {
            "hwsku": "Force10-S6000",
            "platform": "x86_64-kvm_x86_64-r0",
            "mac": "52:54:00:74:c1:04",
            "hostname": "p2l1",
            "type": "LeafRouter",
            "bgp_asn": "200",
            "docker_routing_config_mode": "split"
        }
    },
    "LOOPBACK_INTERFACE": {
        "Loopback0|10.0.0.21/32": {},
        "Loopback0|fc00:0:21::1/128": {}
    },

    "INTERFACE": {
        "Ethernet1": {
            "ipv6_use_link_local_only": "enable"
        },
        "Ethernet2": {
            "ipv6_use_link_local_only": "enable"
        }
    },

    "PORT": {
        "Ethernet1": {
            "lanes": "25,26,27,28",
            "alias": "fortyGigE0/1",
            "index": "0",
            "speed": "40000",
            "admin_status": "up",
            "mtu": "9100"
        },
        "Ethernet2": {
            "lanes": "29,30,31,32",
            "alias": "fortyGigE0/2",
            "index": "1",
            "speed": "40000",
            "admin_status": "up",
            "mtu": "9100"
        }
    }
}

p2l1-FRR相关配置

admin@p2l1:~$ vtysh 

Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

p2l1# config
p2l1(config)# 

no route-map RM_SET_SRC6 permit 10
no route-map RM_SET_SRC permit 10
no ip protocol bgp route-map RM_SET_SRC
no ipv6 protocol bgp route-map RM_SET_SRC6
!
route-map BGP-IPV6 permit 20
 set ipv6 next-hop prefer-global
!
ipv6 route fc00:0:21::/48 Loopback0
!
router bgp 200
 bgp router-id 10.0.0.21
 bgp log-neighbor-changes
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 bgp bestpath as-path multipath-relax
 neighbor Ethernet1 interface remote-as 201
 neighbor Ethernet2 interface remote-as 65001
 !
 segment-routing srv6
  locator MAIN
 !
 address-family ipv4 unicast
  network 10.0.0.21/32
  neighbor Ethernet1 activate
  neighbor Ethernet1 route-map BGP-IPV6 in
  neighbor Ethernet2 activate
  neighbor Ethernet2 route-map BGP-IPV6 in
 exit-address-family
 !
 address-family ipv6 unicast
  network fc00:0:21::/48
  network fc00:0:21::1/128
  neighbor Ethernet1 activate
  neighbor Ethernet1 route-map BGP-IPV6 in
  neighbor Ethernet2 activate
  neighbor Ethernet2 route-map BGP-IPV6 in
  maximum-paths 64
 exit-address-family
!
segment-routing
 srv6
  encapsulation
   source-address fc00:0:21::1
  locators
   locator MAIN
    behavior usid
    prefix fc00:0:21::/48 block-len 32 node-len 16
 !
 srv6
  explicit-sids
   sid fc00:0:21:: behavior uN

p2l0-物理端口相关配置

admin@p2l0:~$ more /etc/sonic/config_db.json
{
    "DEVICE_METADATA": {
        "localhost": {
            "hwsku": "Force10-S6000",
            "platform": "x86_64-kvm_x86_64-r0",
            "mac": "52:54:00:74:c1:05",
            "hostname": "p2l0",
            "type": "LeafRouter",
            "bgp_asn": "201",
            "docker_routing_config_mode": "split"
        }
    },
    "LOOPBACK_INTERFACE": {
        "Loopback0|10.0.0.20/32": {},
        "Loopback0|fc00:0:20::1/128": {}
    },

    "INTERFACE": {
        "Ethernet1": {
            "ipv6_use_link_local_only": "enable"
        },
        "Ethernet2": {},
        "Ethernet2|20.101.1.1/24": {},
        "Ethernet2|2002:0:101:1::1/64": {}
    },

    "PORT": {
        "Ethernet1": {
            "lanes": "25,26,27,28",
            "alias": "fortyGigE0/1",
            "index": "0",
            "speed": "40000",
            "admin_status": "up",
            "mtu": "9100"
        },
        "Ethernet2": {
            "lanes": "29,30,31,32",
            "alias": "GigE0/2",
            "index": "1",
            "speed": "40000",
            "admin_status": "up",
            "mtu": "9100"
        }
    }
}

p2l0-FRR相关配置

admin@p2l0:~$ vtysh 

Hello, this is FRRouting (version 8.4-dev).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

p2l0# config
p2l0(config)#

no route-map RM_SET_SRC6 permit 10
no route-map RM_SET_SRC permit 10
no ip protocol bgp route-map RM_SET_SRC
no ipv6 protocol bgp route-map RM_SET_SRC6
!
route-map BGP-IPV6 permit 20
 set ipv6 next-hop prefer-global
!
ipv6 route fc00:0:20::/48 Loopback0
!
router bgp 201
 bgp router-id 10.0.0.20
 bgp log-neighbor-changes
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 bgp bestpath as-path multipath-relax
 neighbor Ethernet1 interface remote-as 200
 !
 segment-routing srv6
  locator MAIN
 exit
 !
 address-family ipv4 unicast
  network 10.0.0.20/32
  network 20.101.1.0/24
  neighbor Ethernet1 activate
  neighbor Ethernet1 route-map BGP-IPV6 in
 exit-address-family
 !
 address-family ipv6 unicast
  network 2002:0:101:1::/64
  network fc00:0:20::/48
  network fc00:0:20::1/128
  neighbor Ethernet1 activate
  neighbor Ethernet1 route-map BGP-IPV6 in
  maximum-paths 64
 exit-address-family
!
segment-routing
 srv6
  encapsulation
   source-address fc00:0:20::1
  locators
   locator MAIN
    behavior usid
    prefix fc00:0:20::/48 block-len 32 node-len 16
 !
 srv6
  explicit-sids
   sid fc00:0:20:: behavior uN

Server1 相关配置

root@S1:~# ip addr add 10.101.1.2/24 dev ens5
root@S1:~# ip -6 addr add 2001:0:101:1::2/64 dev ens5
root@S1:~# ip link set ens5 up
root@S1:~# ip route add default via 10.101.1.1
root@S1:~# ip -6 route add default via 2001:0:101:1::1
root@S1:~# ping 10.101.1.1
PING 10.101.1.1 (10.101.1.1) 56(84) bytes of data.
64 bytes from 10.101.1.1: icmp_seq=1 ttl=64 time=0.861 ms
^C
--- 10.101.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.861/0.861/0.861/0.000 ms
root@S1:~# ping6 2001:0:101:1::1
PING 2001:0:101:1::1(2001:0:101:1::1) 56 data bytes
64 bytes from 2001:0:101:1::1: icmp_seq=1 ttl=64 time=1.62 ms
^C
--- 2001:0:101:1::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.615/1.615/1.615/0.000 ms

Server2 相关配置

root@S2:~# ip addr add 20.101.1.2/24 dev ens3
root@S2:~# ip -6 addr add 2002:0:101:1::2/64 dev ens3
root@S2:~# ip link set ens3 up
root@S2:~# ip route add default via 20.101.1.1
root@S2:~# ip -6 route add default via 2002:0:101:1::1

测试验证

我本想在双栈场景下对不同角色的设备配置不同的uN,然后在服务器发送报文时打上uSID,来控制报文的走向,但目前Sonic/FRR不支持针对Global IPv4/IPv6进行调度,VPN场景在现有DCN中又没太大意义,所以就暂时放弃了,这里只展示Sonic对双栈的支持情况。

Server1 IPv4场景

我们看下S1是否可以Ping通S2:

root@S1:~# ip route
default via 10.101.1.1 dev ens5 
10.101.1.0/24 dev ens5 proto kernel scope link src 10.101.1.2 
root@S1:~# ping 20.101.1.2
PING 20.101.1.2 (20.101.1.2) 56(84) bytes of data.
64 bytes from 20.101.1.2: icmp_seq=1 ttl=59 time=6.00 ms
64 bytes from 20.101.1.2: icmp_seq=2 ttl=59 time=5.42 ms
64 bytes from 20.101.1.2: icmp_seq=3 ttl=59 time=5.74 ms
^C
--- 20.101.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 5.419/5.717/5.997/0.236 ms

我们使用MTR确认下路径,注意这里我发现一个问题,根据上面的配置,除了第一跳和最后一跳可以回显外,中间路径都无法回显,通过抓包,可以看到回复报文使用了docker的地址,这个地址在每个Sonic上是一样的(真实的白盒可能跟虚拟的不一样),这导致报文转发异常,如下:

root@S1:~# mtr -r 20.101.1.2
Start: 2023-08-08T12:42:46+0000
HOST: S1                          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- _gateway                   0.0%    10    0.8   0.8   0.6   1.3   0.2
  2.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
  3.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
  4.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
  5.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
  6.|-- 20.101.1.2                 0.0%    10    5.1   5.2   4.9   5.4   0.2
admin@p1l0:~$ show ip int 
Interface    Master    IPv4 address/mask    Admin/Oper    BGP Neighbor    Neighbor IP
-----------  --------  -------------------  ------------  --------------  -------------
Ethernet2              10.101.1.1/24        up/up         N/A             N/A
Loopback0              10.0.0.10/32         up/up         N/A             N/A
docker0                240.127.1.1/24       up/down       N/A             N/A
lo                     127.0.0.1/16         up/up         N/A             N/A
admin@l2:~$ show ip int
Interface    Master    IPv4 address/mask    Admin/Oper    BGP Neighbor    Neighbor IP
-----------  --------  -------------------  ------------  --------------  -------------
Loopback0              10.0.0.2/32          up/up         N/A             N/A
docker0                240.127.1.1/24       up/down       N/A             N/A
lo                     127.0.0.1/16         up/up         N/A             N/A

通过下面命令(这些命令我还特意删除了-_-…),可以使中间设备正确回复报文,这个命令应该在eBGP互联的DCN场景中没有太大作用,但问题是它确实生效了,目前还不太理解原理?!

route-map RM_SET_SRC permit 10
 set src 10.0.0.11
!
route-map RM_SET_SRC6 permit 10
 set src fc00:0:11::1
!
ip protocol bgp route-map RM_SET_SRC
ipv6 protocol bgp route-map RM_SET_SRC6
!
root@S1:~# mtr -r 20.101.1.2
Start: 2023-08-08T13:04:54+0000
HOST: S1                          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- _gateway                   0.0%    10    0.8   0.8   0.6   1.0   0.1
  2.|-- 10.0.0.11                  0.0%    10    1.6   1.6   1.5   1.7   0.1
  3.|-- 10.0.0.2                   0.0%    10    2.5   2.6   2.3   3.3   0.3
  4.|-- 10.0.0.21                  0.0%    10    3.4   3.4   2.9   3.7   0.2
  5.|-- 10.0.0.20                  0.0%    10    4.5   4.3   3.6   4.8   0.3
  6.|-- 20.101.1.2                 0.0%    10    4.9   5.1   4.6   5.5   0.3

Server1 IPv6场景

root@S1:~# ip -6 route
::1 dev lo proto kernel metric 256 pref medium
2001:0:101:1::/64 dev ens5 proto kernel metric 256 pref medium
fe80::/64 dev ens5 proto kernel metric 256 pref medium
default via 2001:0:101:1::1 dev ens5 metric 1024 pref medium
root@S1:~# ping -6 2002:0:101:1::2
PING 2002:0:101:1::2(2002:0:101:1::2) 56 data bytes
64 bytes from 2002:0:101:1::2: icmp_seq=1 ttl=59 time=8.28 ms
64 bytes from 2002:0:101:1::2: icmp_seq=2 ttl=59 time=5.89 ms
64 bytes from 2002:0:101:1::2: icmp_seq=3 ttl=59 time=5.60 ms
^C
--- 2002:0:101:1::2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 5.596/6.586/8.277/1.201 ms
root@S1:~# mtr -6 -r 2002:0:101:1::2
Start: 2023-08-08T13:24:10+0000
HOST: S1                          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- _gateway                   0.0%    10    0.9   0.8   0.6   0.9   0.1
  2.|-- fc00:0:11::1               0.0%    10    1.9   1.8   1.6   1.9   0.1
  3.|-- fc00:0:2::1                0.0%    10    2.9   2.8   2.5   3.4   0.2
  4.|-- fc00:0:21::1               0.0%    10    3.8   3.8   3.5   4.5   0.3
  5.|-- fc00:0:20::1               0.0%    10    4.7   4.6   4.2   5.1   0.3
  6.|-- 2002:0:101:1::2            0.0%    10    5.4   5.4   4.7   7.4   0.8

P1L0 IPv4/IPv6信息

下面是P1L0上BGP的路由信息:

p1l0# show bgp sum

IPv4 Unicast Summary (VRF default):
BGP router identifier 10.0.0.10, local AS number 101 vrf-id 0
BGP table version 11
RIB entries 13, using 2496 bytes of memory
Peers 1, using 723 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
Ethernet1       4        100     36695     36683        0    0    0 03w4d10h            5        7 N/A

Total number of neighbors 1

IPv6 Unicast Summary (VRF default):
BGP router identifier 10.0.0.10, local AS number 101 vrf-id 0
BGP table version 12
RIB entries 14, using 2688 bytes of memory
Peers 1, using 723 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
Ethernet1       4        100     36695     36683        0    0    0 03w4d10h            5        8 N/A

Total number of neighbors 1
p1l0# show bgp ipv4 un
BGP table version is 11, local router ID is 10.0.0.10, vrf id 0
Default local pref 100, local AS 101
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.0.0.2/32      Ethernet1                              0 100 65001 i
*> 10.0.0.10/32     0.0.0.0                  0         32768 i
*> 10.0.0.11/32     Ethernet1                0             0 100 i
*> 10.0.0.20/32     Ethernet1                              0 100 65001 200 201 i
*> 10.0.0.21/32     Ethernet1                              0 100 65001 200 i
*> 10.101.1.0/24    0.0.0.0                  0         32768 i
*> 20.101.1.0/24    Ethernet1                              0 100 65001 200 201 i

Displayed  7 routes and 7 total paths
p1l0# 
p1l0# show bgp ipv6 un
BGP table version is 12, local router ID is 10.0.0.10, vrf id 0
Default local pref 100, local AS 101
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*> 2001:0:101:1::/64
                    ::                       0         32768 i
*> 2002:0:101:1::/64
                    Ethernet1                              0 100 65001 200 201 i
*> fc00:0:2::1/128  Ethernet1                              0 100 65001 i
*> fc00:0:10::/48   ::                       0         32768 i
*> fc00:0:10::1/128 ::                       0         32768 i
*> fc00:0:11::1/128 Ethernet1                0             0 100 i
*> fc00:0:20::1/128 Ethernet1                              0 100 65001 200 201 i
*> fc00:0:21::1/128 Ethernet1                              0 100 65001 200 i

Displayed  8 routes and 8 total paths

为什么可以在只启用IPv6 Linklocal互联的拓扑中传递v4路由以及转发?主要是因为BGP的下一跳直接用linklocal地址迭代了,由于eBGP都是直连互联,抓包其实看到的还是v4原生的,linklocal只是迭代到对应端口上而已,如下:

address-family ipv4 unicast
  neighbor Ethernet1 route-map BGP-IPV6 in
!
route-map BGP-IPV6 permit 20
 set ipv6 next-hop prefer-global
p1l0# show bgp ipv4 20.101.1.0/24      
BGP routing table entry for 20.101.1.0/24, version 11
Paths: (1 available, best #1, table default)
  Advertised to non peer-group peers:
  Ethernet1
  100 65001 200 201
    fe80::5054:ff:fe74:c102 from Ethernet1 (10.0.0.11)
    (fe80::5054:ff:fe74:c102) (prefer-global)
      Origin IGP, valid, external, best (First path received)
      Last update: Fri Jul 14 08:57:22 2023
p1l0# show bgp ipv4 20.101.1.0/24 json 
{
  "prefix":"20.101.1.0/24",
  "version":11,
  "advertisedTo":{
    "Ethernet1":{
      "hostname":"sonic"
    }
  },
  "paths":[
    {
      "aspath":{
        "string":"100 65001 200 201",
        "segments":[
          {
            "type":"as-sequence",
            "list":[
              100,
              65001,
              200,
              201
            ]
          }
        ],
        "length":4
      },
      "origin":"IGP",
      "valid":true,
      "version":11,
      "bestpath":{
        "overall":true,
        "selectionReason":"First path received"
      },
      "lastUpdate":{
        "epoch":1689325042,
        "string":"Fri Jul 14 08:57:22 2023\n"
      },
      "nexthops":[
        {
          "ip":"fe80::5054:ff:fe74:c102",
          "hostname":"sonic",
          "afi":"ipv6",
          "scope":"global",
          "metric":0,
          "accessible":true,
          "used":true
        },
        {
          "ip":"fe80::5054:ff:fe74:c102",
          "hostname":"sonic",
          "afi":"ipv6",
          "scope":"link-local",
          "accessible":true
        }
      ],
      "peer":{
        "peerId":"fe80::5054:ff:fe74:c102",
        "routerId":"10.0.0.11",
        "hostname":"sonic",
        "interface":"Ethernet1",
        "type":"external"
      }
    }
  ]
}

测试结论

  • Sonic Switch可以通过使用IPv6 linklocal地址直接完成即插即用的部署,极大简化了DCN双栈的配置(现在部分商用设备也支持此种部署),这应该是未来IPv6部署的趋势,但仍需要考虑在这种架构下,原有的运营工具是否可以顺利运行;
  • SRv6调度能力只有在VPN的场景中才能使用,无法在Default/Global 场景中使用,但DCN基本不会使用VPN,所以这个是短板,需要解决才能继续往下走。如果后面支持了,可以尝试从host进行调度而替代GW一部分的overlay功能;
  • SRv6有一些比较好的特性需要BGP+IGP,如Flexalgo切片技术,看后面业界是否可以让其支持在BGP only场景中,当然还是要看实际业务需求;
  • SRv6的Func是可以自定义的,在未来融合的网络中,是否可以利用自定义Func的特性来优化网络传输效率;
本文出自 Frank's Blog

版权声明:


本文链接:SRv6 Fabric by Sonic Switch
版权声明:本文为原创文章,仅代表个人观点,版权归 Frank Zhao 所有,转载时请注明本文出处及文章链接
你可以留言,或者trackback 从你的网站

留言哦

blonde teen swallows load.xxx videos