Refer to some CASE, customer concern why BGP take so many memory resource, and how to optimize BGP, and why memory not release after optimize, and so on. That hard to answer, in order to get answer, we need check more information. From the article, you can simple to know how to check BGP memory, and how to analyzing the BGP memory, those informaiton will help you and customer to do special optimize.
In TZ database, have more good documents that is troubleshooting fabric guide on ASR9k, but no analysis process that show how to troubleshooting fabric issue on real scenario/CASE. “Very lucky” ? I matched a hot CASE that due to fabric issue cause online fail. So i summaried totally analysis process that will help CSE to narrow down similar issue.
Problem Description
Platform: 9922 + 4 36x10G + 2 8x100G
Version: 5.3.2 + SMU
My customer online a new 9922 to replace old devices. After online, found their business have traffics drop. Base on online information, I found NP no more drop, and business traffics very less. (max 5g under 100g port, all bundle port when online ts)
For the issue, HuaWei is Head, asr9k should mid and end node. And Huawei found they put the traffics to Canada by LDP label, not TE label. So traffics should arrive to HK ASR9k by LDP label, then forwarding by LDP too from HK ASR9k to Canada. That not match normal scenario.
For normal scenario, should use TE label from head to end. And after Huawei put traffics to TE LSP, the issue will recovery to normal. Refer to why Huawei send traffics by LDP, that should their issue, will fix in future by them, but even if to do that, 9k shouldn’t drop packets. We need find whether packets drop at 9k and due to label issue first. Now set up test environment in customer site. waiting update.
2. Change language
gedit /etc/default/locale –> change what do you want
reboot
3. Script couldn’t run on linux “bad interpreter: No such file or directory”
that should dos format, need change to unix format
vi xxx
check format by “:set ff” or “:set fileformat”
change format by “:set ff=unix” or “:set fileformat=unix”
:wq
4. Enable ssh service
sudo apt-get install openssh-server
gedit /etc/ssh/sshd_config –> “PermitRootLogin yes”
restart by “/etc/init.d/ssh restart”
check by “ps -ef|grep ssh”
reboot
5. Disable firewall
check whether enable by “ufw status”
disable by “ufw disable”
[root@bird-162 ~]# sed -i 's/lo:1/internet/g' route-internet
[root@bird-162 ~]# more route-internet
1.0.0.0/24 dev internet
1.0.4.0/22 dev internet
1.0.4.0/24 dev internet
1.0.5.0/24 dev internet
tar -cvf test-tar.tar nvgen_traces >>> create test folder to zip file
tar -xvf test-tar.tar >>> extract tar file
zip -r 661-yang.zip yang
unzip 661-yang.zip
client:~ frank$ cd .ssh
client:.ssh frank$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/frank/.ssh/id_rsa):
Enter passphrase (empty for no passphrase): <<< 真实环境建议增加密码
Enter same passphrase again:
Your identification has been saved in /Users/frank/.ssh/id_rsa.
Your public key has been saved in /Users/frank/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:3zinN8lMY79WaSRDzbaBLQsT4KSlb9MDwrQfFc9FK+o frank@client
The key's randomart image is:
+---[RSA 3072]----+
| . +.+o *o |
| o B .oo+.*.|
| * + +++.o|
| + + .+.o |
| S= + + .|
| ..oo= o.|
| +Eo+.. |
| +* o |
| .. o.. |
+----[SHA256]-----+
client:.ssh frank$ ls -l
total 120
-rw-r--r-- 1 frank staff 751 May 11 2020 config
-rw------- 1 frank staff 2610 Feb 24 11:59 id_rsa <<< 私钥
-rw-r--r-- 1 frank staff 578 Feb 24 11:59 id_rsa.pub <<< 公钥
-rw------- 1 frank staff 23022 Feb 23 19:25 known_hosts
-rw-r--r--@ 1 frank staff 23377 Feb 22 21:00 known_hosts.old
拷贝的Server端
Frank@Yongs-MacBook-Pro ~ % ssh-copy-id -p 8080 -i ~/.ssh/id_rsa.pub [email protected]
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/Users/Frank/.ssh/id_rsa.pub"
The authenticity of host '[10.114.251.163]:8080 ([10.114.251.163]:8080)' can't be established.
ED25519 key fingerprint is SHA256:xxxxxxx
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
[email protected]'s password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh -p '8080' '[email protected]'"
and check to make sure that only the key(s) you wanted were added.
[root@localhost ~]# more ~/.bashrc
# .bashrc
# User specific aliases and functions
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
alias TG26='telnet 172.16.211.154 32897'alias PR11='telnet 172.16.211.154 32907'
alias PR12='telnet 172.16.211.154 32908'
# mv libpcap-1.10.4.tar.gz /usr/local/src/
# mv tcpdump-4.99.4.tar.gz /usr/local/src/
# cd /usr/local/src
# tar -xvf libpcap-1.10.4.tar.gz
# tar -xvf tcpdump-4.99.4.tar.gz
# cd libpcap-1.10.4
# ./configure
# make
# make install
# cd ../tcpdump-4.99.4
# ./configure
# make
# make install
# tcpdump --version
tcpdump version 4.99.4
libpcap version 1.10.4 (with TPACKET_V3)
root@f0-13:~# ftp
ftp> open x.x.x.x 11111
Connected to x.x.x.x.
220 frank-server FTP server ready.
Name (x.x.x.x:root): xxx
331 Password required for xxx.
Password:
230 User xxx logged in, access restrictions apply.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> pas
Passive mode on.
% sudo more /etc/sudoers
......
# root and users in group wheel can run anything on any machine as any user
root ALL = (ALL) ALL
%admin ALL = (ALL) ALL
xxx ALL = (ALL) ALL
[root@frank ~]# more /etc/cron.d/sysstat
# Run system activity accounting tool every 5 minutes
*/5 * * * * root /usr/lib64/sa/sa1 1 1
# 0 * * * * root /usr/lib64/sa/sa1 600 6 &
# Generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A
查看SAR
只列几种常用的命令,具体可以通过-h查看更多option:
[root@frank ~]# sar -u #这里的信息是从二进制文件中读取的
Linux 4.10.4-1.el7.elrepo.x86_64 (frank) 08/09/2023 _x86_64_ (1 CPU)
01:57:57 AM LINUX RESTART
02:00:01 AM CPU %user %nice %system %iowait %steal %idle
02:10:01 AM all 1.34 0.00 0.50 0.25 0.00 97.91
Average: all 1.34 0.00 0.50 0.25 0.00 97.91
[root@frank ~]# sar -f /var/log/sa/sa09 #通过sar读取二进制文件
Linux 4.10.4-1.el7.elrepo.x86_64 (frank) 08/09/2023 _x86_64_ (1 CPU)
01:57:57 AM LINUX RESTART
02:00:01 AM CPU %user %nice %system %iowait %steal %idle
02:10:01 AM all 1.34 0.00 0.50 0.25 0.00 97.91
Average: all 1.34 0.00 0.50 0.25 0.00 97.91
root@sonic:/home/admin# ip link show Ethernet13
104: Ethernet13: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
link/ether 00:90:fb:7f:e1:2c brd ff:ff:ff:ff:ff:ff
Networkctl status xxx
此命令输出的信息会更详细
root@tme91:/home/tmelab# networkctl status ens3np0
● 4: ens3np0
Link File: /usr/lib/systemd/network/99-default.link
Network File: /run/systemd/network/10-netplan-ens3np0.network
Type: ether
State: routable (configured)
Path: pci-0000:27:00.0
Driver: mlx5_core
Vendor: Mellanox Technologies
Model: MT2910 Family [ConnectX-7]
HW Address: 58:a2:e1:xx:xx:xx
MTU: 4200 (min: 68, max: 9978)
Queue Length (Tx/Rx): 768/63
Auto negotiation: yes
Speed: 400Gbps
Duplex: full
Address: 192.168.100.1
fe80::5aa2:e1ff:fe06:3014
Activation Policy: up
Required For Online: yes
Connected To: sonic on port Ethernet27 (Ethernet27)
Sep 12 10:14:25 tme91 systemd-networkd[2372]: ens3np0: Lost carrier
Sep 12 10:14:39 tme91 systemd-networkd[2372]: ens3np0: Gained carrier
Ethtool
此命令很强大,之前一直用,为了方便,也列在这里方便后面review
root@tme91:/home/tmelab# ethtool ens3np0
Settings for ens3np0:
Supported ports: [ Backplane ]
Supported link modes: 1000baseT/Full
......
200000baseCR4/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: None BaseR RS
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: RS
Speed: 400000Mb/s
Duplex: Full
Port: Direct Attach Copper
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000004 (4)
link
Link detected: yes
root@tme91:/home/tmelab# mlxlink -d mlx5_0
Operational Info
----------------
State : Active
Physical state : ETH_AN_FSM_ENABLE
Speed : 400G
Width : 4x
FEC : Standard_RS-FEC - (544,514)
Loopback Mode : No Loopback
Auto Negotiation : ON
Supported Info
--------------
Enabled Link Speed (Ext.) : 0x00013ff2 (400G_4X,200G_2X,200G_4X,100G_1X,100G_2X,100G_4X,50G_1X,50G_2X,40G,25G,10G,1G)
Supported Cable Speed (Ext.) : 0x00013ffe (400G_4X,200G_2X,200G_4X,100G_1X,100G_2X,100G_4X,50G_1X,50G_2X,40G,25G,10G,5G,2.5G,1G)
Troubleshooting Info
--------------------
Status Opcode : 0
Group Opcode : N/A
Recommendation : No issue was observed
Tool Information
----------------
Firmware Version : 28.39.3560
amBER Version : 2.22
MFT Version : mft 4.26.1-6