Troubleshooting “%FABRIC-INGRESSQ-6-LINK_DOWN” on CRS
Introduction
Customer found 0/4/cpu0 have many ingressq asic error, after checked, that should match a know DDTS: CSCuu86430. The issue maybe was triggered when CRS-3 MSCs(140G) interactive with a CRS-X(400G) fabric. After trigger the issue, will found CRS-X’s fabric link of s1rx flapping. Have reload SMU under 514.
For this article, will show how to troubleshooting the fabric link flapping.
Troubleshooting
1. Customer found follow alarm:
LC/0/4/CPU0:Nov 8 00:30:26.752 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down. LC/0/1/CPU0:Nov 8 00:37:59.734 : fabricq_mgr[178]: %FABRIC-FABRICQ-3-PCL_PKT : Minor error in PCL of fabricq asic 0. PCL UC Lost Packet: CAOPCI: 0x18 (0/4, UC, LO):Lost Packet count= 1 LC/0/4/CPU0:Nov 8 00:37:59.734 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down. LC/0/4/CPU0:Nov 8 10:27:27.265 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down. LC/0/4/CPU0:Nov 8 11:06:08.181 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down. LC/0/1/CPU0:Nov 8 11:08:46.132 : fabricq_mgr[178]: %FABRIC-FABRICQ-3-PCL_PKT : Minor error in PCL of fabricq asic 0. PCL UC Partial Packet: CAOPCI: 0x18 (0/4, UC, LO) LC/0/4/CPU0:Nov 8 11:18:34.733 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down. LC/0/4/CPU0:Nov 8 11:28:44.350 : ingressq[235]: %FABRIC-INGRESSQ-6-LINK_DOWN : Ingressq: Link 26 of Asic Instance 0 has been administratively shut down.
2. FPD and Platform info:
-------------------------------------------------------------------------------- 0/4/CPU0 140G-MSC 0.7 lc rommonA 0 2.07 Yes lc rommon 0 2.07 Yes lc fpga1 0 0.08 No lc fpga2 0 0.36 No -------------------------------------------------------------------------------- 0/4/CPU0 14-10GBE 0.81 lc fpga3 1 42.00 No -------------------------------------------------------------------------------- 0/SM1/SP FC-400G/S(SP) N/A IOS XR RUN PWR,NSHUT,MON 0/SM2/SP FC-400G/S(SP) N/A IOS XR RUN PWR,NSHUT,MON 0/SM3/SP FC-400G/S(SP) N/A IOS XR RUN PWR,NSHUT,MON 0/SM4/SP FC-400G/S(SP) N/A IOS XR RUN PWR,NSHUT,MON 0/SM5/SP FC-400G/S(SP) N/A IOS XR RUN PWR,NSHUT,MON 0/SM6/SP FC-400G/S(SP) N/A IOS XR RUN PWR,NSHUT,MON 0/SM7/SP FC-400G/S(SP) N/A IOS XR RUN PWR,NSHUT,MON
3. Checking ingressq link status:
#admin show controllers ingressq fabric links location 0/4/CPU0 Sat Nov 11 15:50:18.867 Beijing Ingressq ASIC instance 0 ----------------------------------------------------. Ingressq link state plane-id link-id ADMIN-STATE OPER-STATE AVAIL-STATE UP-COUNT ----------------------------------------------------. 0 0 UP UP UP 1 0 8 UP UP UP 1 0 16 UP UP UP 1 0 24 UP UP UP 1 0 32 UP UP UP 1 0 40 UP UP UP 1 1 1 UP UP UP 1 1 9 UP UP UP 1 1 17 UP UP UP 1 1 25 UP UP UP 1 1 33 UP UP UP 1 1 41 UP UP UP 1 2 2 UP UP UP 2 2 10 UP UP UP 2 2 18 UP UP UP 2 2 26 UP UP UP 435 <<< 2 34 UP UP UP 2 2 42 UP UP UP 2 3 3 UP UP UP 1 3 11 UP UP UP 1 3 19 UP UP UP 1 3 27 UP UP UP 1 3 35 UP UP UP 1 3 43 UP UP UP 1 4 4 UP UP UP 1 4 12 UP UP UP 1 4 20 UP UP UP 1 4 28 UP UP UP 1 4 36 UP UP UP 1 4 44 UP UP UP 1 5 5 UP UP UP 1 5 13 UP UP UP 1 5 21 UP UP UP 1 5 29 UP UP UP 1 5 37 UP UP UP 1 5 45 UP UP UP 1 6 6 UP UP UP 1 6 14 UP UP UP 1 6 22 UP UP UP 1 6 30 UP UP UP 1 6 38 UP UP UP 1 6 46 UP UP UP 1 7 7 UP UP UP 1 7 15 UP UP UP 1 7 23 UP UP UP 1 7 31 UP UP UP 1 7 39 UP UP UP 1 7 47 UP UP UP 1 ----------------------------------------------------.
4. Checking s1rx link
#admin show controllers fabric link port s1rx brief -------------------------------------------------------------------------------- 0/SM2/SP/1/56/V3 UP/UP 0/4/CPU0/0/26/V1
5. Checking s1rx link stats
#admin show controllers fabric link port s1rx statistics brief Sat Nov 11 15:50:55.460 Beijing Total racks: 1 Rack 0: Flags: E-D - Exceeded display width. Check detail option. SFE Port In In CE UCE PE R/S/M/A/P Data Cells Idle Cells Cells Cells Cells -------------------------------------------------------------------------------- 0/SM2/SP/1/56 316659395355 3994054892193 0 0 0
6. Checking s1rx link detail status
admin show controllers fabric link port s1rx 0/SM2/SP/1/56 detail Sat Nov 11 16:18:53.829 Beijing Sfe Port Admin Oper Avail Down Sfe BP Port BP Other R/S/M/A/P State State State Flags Role Role End ------------------------------------------------------------------------- 0/SM2/SP/1/56/V3 UP UP UP 0/4/CPU0/0/26 --------------------------------------------------- Link Type Pin1 Name Pin2 Name --------------------------------------------------- CHASSIS C14 G24 +-----------------------------------------------------------------------+ | Timestamp Flags Event Direction | +-----------------------------------------------------------------------+ 2017 Nov 11 14:19:51.742 l ADMIN_UP INTERNAL 2017 Nov 11 14:19:51.749 l ADMIN_UP FSDB->DRIVER 2017 Nov 11 14:19:51.752 l DOWN DRIVER->FSDB 2017 Nov 11 14:19:51.809 l UP DRIVER->FSDB 2017 Nov 11 14:19:51.809 ADMIN_UP INTERNAL 2017 Nov 11 14:19:51.814 ADMIN_UP FSDB->DRIVER 2017 Nov 11 14:43:15.363 DOWN DRIVER->FSDB 2017 Nov 11 14:43:15.363 l ADMIN_UP INTERNAL 2017 Nov 11 14:43:15.367 l ADMIN_UP FSDB->DRIVER 2017 Nov 11 14:43:15.417 l DOWN DRIVER->FSDB 2017 Nov 11 14:43:15.494 l UP DRIVER->FSDB 2017 Nov 11 14:43:15.494 ADMIN_UP INTERNAL 2017 Nov 11 14:43:15.499 ADMIN_UP FSDB->DRIVER 2017 Nov 11 15:52:23.291 DOWN DRIVER->FSDB 2017 Nov 11 15:52:23.291 l ADMIN_UP INTERNAL 2017 Nov 11 15:52:23.296 l ADMIN_UP FSDB->DRIVER 2017 Nov 11 15:52:23.345 l DOWN DRIVER->FSDB 2017 Nov 11 15:52:23.420 l UP DRIVER->FSDB 2017 Nov 11 15:52:23.420 ADMIN_UP INTERNAL 2017 Nov 11 15:52:23.421 ADMIN_UP FSDB->DRIVER ------------------------------------------------------------------------- Neighbors ------------------------------------------------------------------------- s1rx/0/SM2/SP/1/58 ingressqtx/0/4/CPU0/0/10 s1rx/0/SM2/SP/1/50 ingressqtx/0/6/CPU0/0/10 -------------------------------------------------------------------------
7. Checking s1rx flapping status
#admin show controllers sfe link-info rx 0 127 flap instance 1 location 0/sm2/sp Sat Nov 11 16:19:01.596 Beijing ------------------------------------------------------------------------- Node ID:0/SM2/SP Link ID Oper Link Admin Status Errors Shuts Bringdowns ------------------------------------------------------------------------- 0/SM2/SP/1/56 UP 327 0 0
8. Checking asic error:
#admin show asic-errors all summary location 0/sm2/sp Sat Nov 11 16:19:19.718 Beijing ************************************************************ * Superstar ASIC Error Summary * ************************************************************ Instance : 0 Number of nodes : 0 SBE error count : 0 MBE error count : 0 Parity error count : 0 Generic error count : 0 Reset error count : 0 Barrier error count : 0 Unexpected error count: 0 Link error count : 0 OOR Threshold count : 0 BP error count : 0 IO error count : 0 Ucode error count : 0 Config error count : 0 Indirect error count : 0 -------------------- Instance : 1 Number of nodes : 2 SBE error count : 0 MBE error count : 0 Parity error count : 0 Generic error count : 0 Reset error count : 0 Barrier error count : 0 Unexpected error count: 0 Link error count : 327 <<< OOR Threshold count : 0 BP error count : 0 IO error count : 0 Ucode error count : 0 Config error count : 0 Indirect error count : 0 --------------------
9. Checking detail asic-error
************************************************************ * Instance : 1 * ************************************************************ ************************************************************ * Single Bit Errors * ************************************************************ ************************************************************ * Multiple Bit Errors * ************************************************************ ************************************************************ * Parity Errors * ************************************************************ ************************************************************ * Barrier Errors * ************************************************************ ************************************************************ * Unexpected Errors * ************************************************************ ************************************************************ * Link Errors * ************************************************************ FULLQ_B, FC-400G/S, 0/SM2/SP, sfe[1] Name : DORM3.orl_b_csrs.orl_err_hier_int.orl_HW_LINK_SHUTDOWN_leaf_int.int_LOL_EVENT_LINK0 Leaf ID : 0x160600ca Thresh/period(s): 20/day Error count : 327 Last clearing : Sun Oct 29 05:50:15 2017 Last N errors : 50 -------------------------------------------------------------- ...... Last N errors. @Time, Error-Data ------------------------------------------ Nov 11 00:14:22.213982: Error description: DORM: 3 ORL: 1 Stage: 1 Data link: s1rx/0/SM2/SP/1/56 Nov 11 00:47:37.787079: Error description: DORM: 3 ORL: 1 Stage: 1 Data link: s1rx/0/SM2/SP/1/56 Nov 11 01:01:51.484478: Error description: DORM: 3 ORL: 1 Stage: 1 Data link: s1rx/0/SM2/SP/1/56 Nov 11 01:21:48.864781: Error description: DORM: 3 ORL: 1 Stage: 1 Data link: s1rx/0/SM2/SP/1/56 Nov 11 02:09:06.313837: Error description: DORM: 3 ORL: 1 Stage: 1 Data link: s1rx/0/SM2/SP/1/56 Nov 11 02:42:42.938339: Error description: DORM: 3 ORL: 1 Stage: 1 Data link: s1rx/0/SM2/SP/1/56 ...... --------------------------------------------------------------
版权声明:
本文链接:Troubleshooting “%FABRIC-INGRESSQ-6-LINK_DOWN” on CRS
版权声明:本文为原创文章,仅代表个人观点,版权归 Frank Zhao 所有,转载时请注明本文出处及文章链接