ASR9k, GSR VPLS *PVID_Inc issue
跟同事讨论了下关于*PVID_Inc的问题,收益颇丰!估计没几个人能悟出在VPLS中PVID_inc的真谛。
为了验证讨论的内容,特意做下面的实验:
Topology
天蓝色属于一个bridge-domain vplstest1000;而红色属于bridge-domain vplstest6002
本文只讨论天蓝色的bridge-domain!
测试目的
验证下为什么会有PVID_inc的问题出现,及抓包分析,用有力的证据来验证
测试过程
先来重现*PVID_Inc的问题
简单配置信息如下:
7609-S:
interface GigabitEthernet9/5 switchport switchport trunk encapsulation dot1q switchport trunk allowed vlan 1108 switchport mode trunk
ASR9k:
RP/0/RSP0/CPU0:ASR9010-1#sh run l2vpn bridge group test bridge-domain vplstest1000 Thu Feb 13 08:25:50.813 UTC l2vpn bridge group test bridge-domain vplstest1000 mtu 9000 interface GigabitEthernet0/2/0/0.1108 ! vfi vplstest1000 neighbor 2.2.2.2 pw-id 1000 ! ! ! ! ! RP/0/RSP0/CPU0:ASR9010-1#sh run int g0/2/0/0.1108 Thu Feb 13 08:26:05.238 UTC interface GigabitEthernet0/2/0/0.1108 l2transport encapsulation dot1q 1108 rewrite ingress tag pop 1 symmetric !
GSR:
l2 vfi vplstest1000 manual vpn id 1000 bridge-domain 1000 neighbor 1.1.1.1 encapsulation mpls ! interface GigabitEthernet5/0/7.1100 encapsulation dot1Q 1100 no ip directed-broadcast bridge-domain 1000 ! interface GigabitEthernet5/0/5.1109 encapsulation dot1Q 1109 no ip directed-broadcast bridge-domain 1000
VPLS及MPLS信息如下:
ASR9k:
RP/0/RSP0/CPU0:ASR9010-1#show l2vpn bridge-domain bd-name vplstest1000 br Thu Feb 13 08:35:23.927 UTC Legend: pp = Partially Programmed. Bridge Group:Bridge-Domain Name ID State Num ACs/up Num PWs/up -------------------------------- ----- -------------- ------------ ------------- test:vplstest1000 3 up 1/1 1/1 RP/0/RSP0/CPU0:ASR9010-1#show l2vpn bridge-domain bd-name vplstest1000 detail | b List of VFIs: Thu Feb 13 08:36:03.581 UTC List of VFIs: VFI vplstest1000 (up) PW: neighbor 2.2.2.2, PW ID 1000, state is up ( established ) PW class not set, XC ID 0xc0000003 Encapsulation MPLS, protocol LDP Source address 1.1.1.1 PW type Ethernet, control word disabled, interworking none Sequencing not set MPLS Local Remote ------------ ------------------------------ ------------------------- Label 16007 45 Group ID 0x3 0x0 Interface vplstest1000 unknown MTU 9000 9000 Control word disabled disabled PW type Ethernet Ethernet VCCV CV type 0x2 0x2 (LSP ping verification) (LSP ping verification) VCCV CC type 0x6 0x2 (router alert label) (router alert label) (TTL expiry) ------------ ------------------------------ ------------------------- MIB cpwVcIndex: 3221225475 RP/0/RSP0/CPU0:ASR9010-1#sh mpls for Thu Feb 13 08:36:26.813 UTC Local Outgoing Prefix Outgoing Next Hop Bytes Label Label or ID Interface Switched ------ ----------- ------------------ ------------ --------------- ------------ 16003 Pop 2.2.2.2/32 Gi0/2/0/3 123.1.1.2 4230820 16007 Pop PW(2.2.2.2:1000) BD=3 point2point 5439860
GSR:
GSR-12816-1#sh mpls for Local Outgoing Prefix Bytes tag Outgoing Next Hop tag tag or VC or Tunnel Id switched interface 42 Pop tag 1.1.1.1/32 0 Gi5/0/6 123.1.1.1 45 Untagged l2ckt(1000) 4168444 none point2point GSR-12816-1#sh mpls l2transport binding 1000 Destination Address: 1.1.1.1, VC ID: 1000 Local Label: 45 Cbit: 0, VC Type: Ethernet, GroupID: 0 MTU: 9000, Interface Desc: n/a VCCV: CC Type: RA [2] CV Type: LSPV [2] Remote Label: 16007 Cbit: 0, VC Type: Ethernet, GroupID: 3 MTU: 9000, Interface Desc: vplstest1000 VCCV: CC Type: RA [2], TTL [3] CV Type: LSPV [2] GSR-12816-1#sh mpls l2transport vc 1000 Local intf Local circuit Dest address VC ID Status ------------- -------------------------- --------------- ---------- ---------- VFI vplstest10 VFI 1.1.1.1 1000 UP
测试结果:
7609:
*Feb 13 00:28:24.718: %SPANTREE-SP-2-RECV_PVID_ERR: Received BPDU with inconsistent peer vlan id 1100 on GigabitEthernet9/5 VLAN1108. *Feb 13 00:28:24.718: %SPANTREE-SP-2-BLOCK_PVID_LOCAL: Blocking GigabitEthernet9/5 on VLAN1108. Inconsistent local vlan. 7609-S#sh spanning-tree vlan 1108 VLAN1108 Spanning tree enabled protocol ieee Root ID Priority 33876 Address 001b.0de6.f0c0 This bridge is the root Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Bridge ID Priority 33876 (priority 32768 sys-id-ext 1108) Address 001b.0de6.f0c0 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec Aging Time 300 Interface Role Sts Cost Prio.Nbr Type ------------------- ---- --- --------- -------- -------------------------------- Gi9/5 Desg BKN*4 128.2053 P2p *PVID_Inc
Switch:
*Mar 6 21:17:28.399: %SPANTREE-2-RECV_PVID_ERR: Received BPDU with inconsistent peer vlan id 1108 on GigabitEthernet1/0/1 VLAN1100.
*Mar 6 21:17:28.399: %SPANTREE-2-BLOCK_PVID_LOCAL: Blocking GigabitEthernet1/0/1 on VLAN1100. Inconsistent local vlan.
结论:
两个部分会有PVID这个概念,一个是802.1q,另一个就是BPDU中隐含的PVID(专业术语Special TLV ),根据相关文档:
The error “PVID-inconsistency” is generated if PVST+ BPDU is received on a VLAN different from the one it was generated from.
那么这个VLAN到底指的是什么呢?不管是9k还是GSR,当出PE端口后都会重新打上dot1q的标记,这个标记跟下联交换机是一样。所以咱们可以把这个BPDU分两部分来看,一部分是本端PE Rewrite的Dot1q,另一部分是从远端PE透传过来的BPDU。所以从9k出来,802.1q带着1108的PVID,BPDU中带着1100的PVID,现在问题就是交换机会check这两部分? 还是说只check BPDU中的PVID或dot1q中的PVID?
通过76在SP上用Netdr和ELAM抓包,只能抓到发出的BPDU,没有收到任何BPDU,BPDU根本没有PUNT到76的Superman引擎。而且show的信息也能看出来,另外由于module9是CFC板卡,没法在module9上用ELAM抓包
7609-S#sh spanning-tree vlan 1108 de
VLAN1108 is executing the ieee compatible Spanning Tree protocol
Bridge Identifier has priority 32768, sysid 1108, address 001b.0de6.f0c0
Configured hello time 2, max age 20, forward delay 15
We are the root of the spanning tree
Topology change flag not set, detected flag not set
Number of topology changes 2 last change occurred 00:29:51 ago
from GigabitEthernet9/5
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
Timers: hello 0, topology change 0, notification 0, aging 300
Port 2053 (GigabitEthernet9/5) of VLAN1108 is broken (Port VLAN ID Mismatch)
Port path cost 4, Port priority 128, Port Identifier 128.2053.
Designated root has priority 33876, address 001b.0de6.f0c0
Designated bridge has priority 33876, address 001b.0de6.f0c0
Designated port id is 128.2053, designated path cost 0
Timers: message age 0, forward delay 14, hold 0
Number of transitions to forwarding state: 1
Link type is point-to-point by default
BPDU: sent 56800, received 0 <<<
好吧,咱们来测试下,把76下的trunk allow去掉,这样两个VLAN都有spanning tree了,结果是1108和1100都出现*PVID_Inc。Ok,这样就有结果了,两部分必须完全相同,出现这种问题的根源主要由于VPLS两边VLAN不一致导致,本来两个不同VLAN是不能互访的,通过VPLS使其互访了,那必然会有这个问题,而且这个跟Type4/5没有任何关系。
好,咱们从Spirent上抓下包看看,GSR的端口配置是dot1q 1109:
ac:fc的mac是76的端口MAC,而63:81是SWITCH的端口MAC,标红的那部分就是PVST/PVST+的交换机才能识别的Special TLV,后2byte是PVID,这里可以算出来是1108
好,为了double confirm that,把7609的VLAN改成VLAN1234,然后在抓包:
7609-S(config)#vlan 1234 7609-S(config-vlan)#exit 7609-S(config)#int g9/5 7609-S(config-if)#sw tr all vlan 1234 7609-S#sh run int g9/5 Building configuration... Current configuration : 145 bytes ! interface GigabitEthernet9/5 switchport switchport trunk encapsulation dot1q switchport trunk allowed vlan 1234 switchport mode trunk RP/0/RSP0/CPU0:ASR9010-1#sh run int g0/2/0/0.1234 Thu Feb 13 09:30:55.483 UTC interface GigabitEthernet0/2/0/0.1234 l2transport encapsulation dot1q 1234 rewrite ingress tag pop 1 symmetric ! RP/0/RSP0/CPU0:ASR9010-1#sh run l2vpn bridge group test bridge-domain vplstest1000 Thu Feb 13 09:38:07.120 UTC l2vpn bridge group test bridge-domain vplstest1000 mtu 9000 interface GigabitEthernet0/2/0/0.1234 ! vfi vplstest1000 neighbor 2.2.2.2 pw-id 1000 ! ! ! ! !