1 12 月, 2017
Customer get HostOS when connect to XRVM console after installing ISSU SMU, and confirmed XRVM normal work.
- SAVM and XRVM on all RPs and LCs, FC only have SAVM, check by “show vm” in admin vm, SAVM and XRVM mapping to console 0 & 1, as follow:
- Except SAVM and XRVM, have key components that is host system in RPs or LCs, you can check host by follow steps, login by “ssh ” after “chvrf 0 bash”.
23 11 月, 2017
Customer found 0/4/cpu0 have many ingressq asic error, after checked, that should match a know DDTS: CSCuu86430. The issue maybe was triggered when CRS-3 MSCs(140G) interactive with a CRS-X(400G) fabric. After trigger the issue, will found CRS-X’s fabric link of s1rx flapping. Have reload SMU under 514.
For this article, will show how to troubleshooting the fabric link flapping.
1. Customer found follow alarm:
LC/0/4/CPU0:Nov 8 00:30:26.752 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down.
LC/0/1/CPU0:Nov 8 00:37:59.734 : fabricq_mgr[178]: %FABRIC-FABRICQ-3-PCL_PKT : Minor error in PCL of fabricq asic 0. PCL UC Lost Packet: CAOPCI: 0x18 (0/4, UC, LO):Lost Packet count= 1
LC/0/4/CPU0:Nov 8 00:37:59.734 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down.
LC/0/4/CPU0:Nov 8 10:27:27.265 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down.
LC/0/4/CPU0:Nov 8 11:06:08.181 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down.
LC/0/1/CPU0:Nov 8 11:08:46.132 : fabricq_mgr[178]: %FABRIC-FABRICQ-3-PCL_PKT : Minor error in PCL of fabricq asic 0. PCL UC Partial Packet: CAOPCI: 0x18 (0/4, UC, LO)
LC/0/4/CPU0:Nov 8 11:18:34.733 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down.
LC/0/4/CPU0:Nov 8 11:28:44.350 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down.
27 10 月, 2017
The article will talk about what is “ACK-DPM-WAIT”, and how to troubleshooting the similar scenario. Due to limitation info that couldn’t narrow down, in my CASE, so will update the article if the issue happen again and find RCA.
Problem Description
Version: 5.1.3 + induvial SMU
Platform: 9010 + Mod80 + A9K-MPA-4X10GE
BNG: IPOE, DHCP Proxy, 28k session
My customer found part of BNG session was failure. Trigger is due to customer power supply have issue that cause the asr9k re-power. After 9k reload, found dhcpd and arp have so many alarms, dhcpd was recovery after tried restart process multi times, but arp continue have SPIO alarm even if tried restart process, customer had enabled arp local disable on the BNG port.
The issue sessions got address correct from DHCP, but session would be deleted after 15min. After checked on asr9k, we found issue session pending on ACK_DPM_WAIT status. And the issue was auto recovery at approx.19:00-19:30. And at that timeslot, arp alarm disappear too.
22 3 月, 2017
Refer to some CASE, customer concern why BGP take so many memory resource, and how to optimize BGP, and why memory not release after optimize, and so on. That hard to answer, in order to get answer, we need check more information. From the article, you can simple to know how to check BGP memory, and how to analyzing the BGP memory, those informaiton will help you and customer to do special optimize.
1. Default scenario, no any BGP route
17 6 月, 2016
In TZ database, have more good documents that is troubleshooting fabric guide on ASR9k, but no analysis process that show how to troubleshooting fabric issue on real scenario/CASE. “Very lucky” ? I matched a hot CASE that due to fabric issue cause online fail. So i summaried totally analysis process that will help CSE to narrow down similar issue.
Problem Description
- Platform: 9922 + 4 36x10G + 2 8x100G
- Version: 5.3.2 + SMU
My customer online a new 9922 to replace old devices. After online, found their business have traffics drop. Base on online information, I found NP no more drop, and business traffics very less. (max 5g under 100g port, all bundle port when online ts)