CISCO WAE: wae-db dead but pid file exists

Introduction

If you found similar alarms, you can follow the guide and try to workaround the issue.

Problem

My customer found WAE REST API not workable, after checked, found wae-db have issue and report alarm, as follow.
And try to restart the WAE service and reload VM, not any useful.

[root@wae-auto wae-db]# service --status-all | grep wae
JAVA_EXECUTABLE or HSQLDB_JAR_PATH in '/etc/sysconfig/hsqldb' is set to a non-file.
wae-appenginecore is running OK, with PID=18606
wae-core is running OK, with PID=18163
wae-db dead but pid file exists  <<<<<<
wae-designapiserver is running OK, with PID=18847
wae-messaging is running OK, with PID=17874
wae-ni is running OK, with PID=17525
wae-osc is running OK, with PID=18369
wae-svcs-dashui is running OK, with PID=16652
wae-svcs-db is running OK, with PID=16277
wae-svcs-localrepo is not running
wae-svcs-log is running OK, with PID=16497
wae-svcs-logagent is running OK, with PID=16570
wae-svcs-metricsbkr is running OK, with PID=16716
wae-svcs-metricsd is running OK, with PID=16806
wae-svcs-mon is running OK, with PID=16138
wae-svcs-ui is running OK, with PID=16917
wae-system-server is running OK, with PID=16186
wae-web-server is running OK, with PID=17761

Solution

After interactive with BU, resolved the issue.

Wae-db is the Cassandra database that wae is using for internal purposes. It’s different from mld. Based on the error message wae-db process stopped working for some reason but the wae-db service still is ruuning.

wae-db dead but pid file exists

Anything you ran in Linux as a service, Linux will create a pid file with the pid of that process, it indicates Linux that the service is running so you cannot run another instance of the same service. Linux should delete that file when service is stopped, sometime after some failures the service may stop/crash but Linux may not know and won’t delete the pid file. So have the issue, if you google the issue, you will found more process have similar alarms.

In order to recovery the similar issue, we need to do follow action:

  • kill the process
  • delete the pid file
  • restart the service

The pid file of wae-db is located in the /opt/cariden/software/wae-db folder and it’s called Cassandra.pid.

[root@wae-auto wae-db]# ll
total 336
drwxrwxr-x. 2 wae wae   4096 Mar 28  2017 bin
-rw-------. 1 wae wae      5 Jan 19 00:53 cassandra.pid

Workaround example:

[root@wae-auto ~]# service wae-db status
wae-db dead but pid file exists
[root@wae-auto ~]# ps -ef | grep wae-db
wae       5633     1  0 Jan17 ?        00:12:57 /usr/java/latest/bin/java -ea -javaagent:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1G -Xmx1G -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:CompileCommandFile=/opt/cariden/software/wae-db/conf/hotspot_compiler -XX:CMSWaitDuration=10000 -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/opt/cariden/software/wae-db/logs -Dcassandra.storagedir=/opt/cariden/software/wae-db/data -Dcassandra-pidfile=/opt/cariden/software/wae-db/cassandra.pid -cp /opt/cariden/software/wae-db/conf:/opt/cariden/software/wae-db/build/classes/main:/opt/cariden/software/wae-db/build/classes/thrift:/opt/cariden/software/wae-db/lib/airline-0.6.jar:/opt/cariden/software/wae-db/lib/antlr-runtime-3.5.2.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-clientutil-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-thrift-2.1.1.jar:/opt/cariden/software/wae-db/lib/avro-1.7.7.jar:/opt/cariden/software/wae-db/lib/avro-ipc-1.7.7.jar:/opt/cariden/software/wae-db/lib/commons-cli-1.1.jar:/opt/cariden/software/wae-db/lib/commons-codec-1.2.jar:/opt/cariden/software/wae-db/lib/commons-lang3-3.1.jar:/opt/cariden/software/wae-db/lib/commons-math3-3.2.jar:/opt/cariden/software/wae-db/lib/compress-lzf-0.8.4.jar:/opt/cariden/software/wae-db/lib/concurrentlinkedhashmap-lru-1.4.jar:/opt/cariden/software/wae-db/lib/disruptor-3.0.1.jar:/opt/cariden/software/wae-db/lib/flume-file-channel-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-configuration-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-core-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-node-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-sdk-1.5.2.jar:/opt/cariden/software/wae-db/lib/guava-16.0.jar:/opt/cariden/software/wae-db/lib/high-scale-lib-1.0.6.jar:/opt/cariden/software/wae-db/lib/jackson-core-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jackson-mapper-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar:/opt/cariden/software/wae-db/lib/javax.inject.jar:/opt/cariden/software/wae-db/lib/jbcrypt-0.3m.jar:/opt/cariden/software/wae-db/lib/jline-1.0.jar:/opt/cariden/software/wae-db/lib/jna-4.0.0.jar:/opt/cariden/software/wae-db/lib/json-simple-1.1.jar:/opt/cariden/software/wae-db/lib/libthrift-0.9.1.jar:/opt/cariden/software/wae-db/lib/logback-classic-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-core-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-flume-1.0.0-non-osgi.jar:/opt/cariden/software/wae-db/lib/lz4-1.2.0.jar:/opt/cariden/software/wae-db/lib/metrics-core-2.2.0.jar:/opt/cariden/software/wae-db/lib/netty-3.4.0.Final.jar:/opt/cariden/software/wae-db/lib/netty-all-4.0.23.Final.jar:/opt/cariden/software/wae-db/lib/reporter-config-2.1.0.jar:/opt/cariden/software/wae-db/lib/slf4j-api-1.7.2.jar:/opt/cariden/software/wae-db/lib/snakeyaml-1.11.jar:/opt/cariden/software/wae-db/lib/snappy-java-1.0.5.2.jar:/opt/cariden/software/wae-db/lib/stream-2.5.2.jar:/opt/cariden/software/wae-db/lib/stringtemplate-4.0.2.jar:/opt/cariden/software/wae-db/lib/super-csv-2.1.0.jar:/opt/cariden/software/wae-db/lib/thrift-server-0.3.7.jar org.apache.cassandra.service.CassandraDaemon
wae       6283 16138  0 Jan18 ?        00:00:00 [wae-db] 
root     17980  4403  0 00:49 pts/5    00:00:00 grep wae-db
[root@wae-auto ~]# cd /opt/cariden/software/wae-db
[root@wae-auto wae-db]# ll
total 336
drwxrwxr-x. 2 wae wae   4096 Mar 28  2017 bin
-rw-------. 1 wae wae      5 Jan 19 00:53 cassandra.pid
-rw-r--r--. 1 wae wae 225860 Oct 22  2014 CHANGES.txt
drwxrwxr-x. 3 wae wae   4096 Mar 28  2017 conf
drwxrwxr-x. 2 wae wae   4096 Mar 28  2017 interface
drwxrwxr-x. 4 wae wae   4096 Mar 28  2017 javadoc
drwxrwxr-x. 3 wae wae   4096 Mar 28  2017 lib
-rw-r--r--. 1 wae wae  11609 Oct 22  2014 LICENSE.txt
-rw-r--r--. 1 wae wae  63584 Oct 22  2014 NEWS.txt
-rw-r--r--. 1 wae wae   2117 Oct 22  2014 NOTICE.txt
drwxrwxr-x. 3 wae wae   4096 Mar 28  2017 pylib
drwxrwxr-x. 4 wae wae   4096 Mar 28  2017 tools
[root@wae-auto wae-db]# cat cassandra.pid
27757
[root@wae-auto wae-db]# /etc/init.d/wae-db stop
Shutting down wae-db:                                      [  OK  ]
[root@wae-auto wae-db]# ps -ef | grep wae-db
wae       5633     1  0 Jan17 ?        00:12:59 /usr/java/latest/bin/java -ea -javaagent:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1G -Xmx1G -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:CompileCommandFile=/opt/cariden/software/wae-db/conf/hotspot_compiler -XX:CMSWaitDuration=10000 -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/opt/cariden/software/wae-db/logs -Dcassandra.storagedir=/opt/cariden/software/wae-db/data -Dcassandra-pidfile=/opt/cariden/software/wae-db/cassandra.pid -cp /opt/cariden/software/wae-db/conf:/opt/cariden/software/wae-db/build/classes/main:/opt/cariden/software/wae-db/build/classes/thrift:/opt/cariden/software/wae-db/lib/airline-0.6.jar:/opt/cariden/software/wae-db/lib/antlr-runtime-3.5.2.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-clientutil-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-thrift-2.1.1.jar:/opt/cariden/software/wae-db/lib/avro-1.7.7.jar:/opt/cariden/software/wae-db/lib/avro-ipc-1.7.7.jar:/opt/cariden/software/wae-db/lib/commons-cli-1.1.jar:/opt/cariden/software/wae-db/lib/commons-codec-1.2.jar:/opt/cariden/software/wae-db/lib/commons-lang3-3.1.jar:/opt/cariden/software/wae-db/lib/commons-math3-3.2.jar:/opt/cariden/software/wae-db/lib/compress-lzf-0.8.4.jar:/opt/cariden/software/wae-db/lib/concurrentlinkedhashmap-lru-1.4.jar:/opt/cariden/software/wae-db/lib/disruptor-3.0.1.jar:/opt/cariden/software/wae-db/lib/flume-file-channel-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-configuration-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-core-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-node-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-sdk-1.5.2.jar:/opt/cariden/software/wae-db/lib/guava-16.0.jar:/opt/cariden/software/wae-db/lib/high-scale-lib-1.0.6.jar:/opt/cariden/software/wae-db/lib/jackson-core-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jackson-mapper-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar:/opt/cariden/software/wae-db/lib/javax.inject.jar:/opt/cariden/software/wae-db/lib/jbcrypt-0.3m.jar:/opt/cariden/software/wae-db/lib/jline-1.0.jar:/opt/cariden/software/wae-db/lib/jna-4.0.0.jar:/opt/cariden/software/wae-db/lib/json-simple-1.1.jar:/opt/cariden/software/wae-db/lib/libthrift-0.9.1.jar:/opt/cariden/software/wae-db/lib/logback-classic-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-core-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-flume-1.0.0-non-osgi.jar:/opt/cariden/software/wae-db/lib/lz4-1.2.0.jar:/opt/cariden/software/wae-db/lib/metrics-core-2.2.0.jar:/opt/cariden/software/wae-db/lib/netty-3.4.0.Final.jar:/opt/cariden/software/wae-db/lib/netty-all-4.0.23.Final.jar:/opt/cariden/software/wae-db/lib/reporter-config-2.1.0.jar:/opt/cariden/software/wae-db/lib/slf4j-api-1.7.2.jar:/opt/cariden/software/wae-db/lib/snakeyaml-1.11.jar:/opt/cariden/software/wae-db/lib/snappy-java-1.0.5.2.jar:/opt/cariden/software/wae-db/lib/stream-2.5.2.jar:/opt/cariden/software/wae-db/lib/stringtemplate-4.0.2.jar:/opt/cariden/software/wae-db/lib/super-csv-2.1.0.jar:/opt/cariden/software/wae-db/lib/thrift-server-0.3.7.jar org.apache.cassandra.service.CassandraDaemon
wae       6283 16138  0 Jan18 ?        00:00:00 [wae-db] 
root     32425  4403  0 00:56 pts/5    00:00:00 grep wae-db
[root@wae-auto wae-db]# kill 5633
[root@wae-auto wae-db]# ps -ef | grep wae-db
wae       6283 16138  0 Jan18 ?        00:00:00 [wae-db] 
root     32451  4403  0 00:57 pts/5    00:00:00 grep wae-db
[root@wae-auto wae-db]# service wae-db status
wae-db is not running
[root@wae-auto wae-db]# /etc/init.d/wae-db start
Starting wae-db:                                           [  OK  ]
[root@wae-auto wae-db]# service wae-db status
wae-db is running OK, with PID=32634
anyShare分享到:
你可以留言,或者trackback 从你的网站

留言哦