CISCO WAE: wae-db dead but pid file exists
Introduction
If you found similar alarms, you can follow the guide and try to workaround the issue.
Problem
My customer found WAE REST API not workable, after checked, found wae-db have issue and report alarm, as follow.
And try to restart the WAE service and reload VM, not any useful.
[[email protected] wae-db]# service --status-all | grep wae JAVA_EXECUTABLE or HSQLDB_JAR_PATH in '/etc/sysconfig/hsqldb' is set to a non-file. wae-appenginecore is running OK, with PID=18606 wae-core is running OK, with PID=18163 wae-db dead but pid file exists <<<<<< wae-designapiserver is running OK, with PID=18847 wae-messaging is running OK, with PID=17874 wae-ni is running OK, with PID=17525 wae-osc is running OK, with PID=18369 wae-svcs-dashui is running OK, with PID=16652 wae-svcs-db is running OK, with PID=16277 wae-svcs-localrepo is not running wae-svcs-log is running OK, with PID=16497 wae-svcs-logagent is running OK, with PID=16570 wae-svcs-metricsbkr is running OK, with PID=16716 wae-svcs-metricsd is running OK, with PID=16806 wae-svcs-mon is running OK, with PID=16138 wae-svcs-ui is running OK, with PID=16917 wae-system-server is running OK, with PID=16186 wae-web-server is running OK, with PID=17761
Solution
After interactive with BU, resolved the issue.
Wae-db is the Cassandra database that wae is using for internal purposes. It’s different from mld. Based on the error message wae-db process stopped working for some reason but the wae-db service still is ruuning.
wae-db dead but pid file exists
Anything you ran in Linux as a service, Linux will create a pid file with the pid of that process, it indicates Linux that the service is running so you cannot run another instance of the same service. Linux should delete that file when service is stopped, sometime after some failures the service may stop/crash but Linux may not know and won’t delete the pid file. So have the issue, if you google the issue, you will found more process have similar alarms.
In order to recovery the similar issue, we need to do follow action:
- kill the process
- delete the pid file
- restart the service
The pid file of wae-db is located in the /opt/cariden/software/wae-db folder and it’s called Cassandra.pid.
[[email protected] wae-db]# ll total 336 drwxrwxr-x. 2 wae wae 4096 Mar 28 2017 bin -rw-------. 1 wae wae 5 Jan 19 00:53 cassandra.pid
Workaround example:
[[email protected] ~]# service wae-db status wae-db dead but pid file exists [[email protected] ~]# ps -ef | grep wae-db wae 5633 1 0 Jan17 ? 00:12:57 /usr/java/latest/bin/java -ea -javaagent:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1G -Xmx1G -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:CompileCommandFile=/opt/cariden/software/wae-db/conf/hotspot_compiler -XX:CMSWaitDuration=10000 -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/opt/cariden/software/wae-db/logs -Dcassandra.storagedir=/opt/cariden/software/wae-db/data -Dcassandra-pidfile=/opt/cariden/software/wae-db/cassandra.pid -cp /opt/cariden/software/wae-db/conf:/opt/cariden/software/wae-db/build/classes/main:/opt/cariden/software/wae-db/build/classes/thrift:/opt/cariden/software/wae-db/lib/airline-0.6.jar:/opt/cariden/software/wae-db/lib/antlr-runtime-3.5.2.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-clientutil-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-thrift-2.1.1.jar:/opt/cariden/software/wae-db/lib/avro-1.7.7.jar:/opt/cariden/software/wae-db/lib/avro-ipc-1.7.7.jar:/opt/cariden/software/wae-db/lib/commons-cli-1.1.jar:/opt/cariden/software/wae-db/lib/commons-codec-1.2.jar:/opt/cariden/software/wae-db/lib/commons-lang3-3.1.jar:/opt/cariden/software/wae-db/lib/commons-math3-3.2.jar:/opt/cariden/software/wae-db/lib/compress-lzf-0.8.4.jar:/opt/cariden/software/wae-db/lib/concurrentlinkedhashmap-lru-1.4.jar:/opt/cariden/software/wae-db/lib/disruptor-3.0.1.jar:/opt/cariden/software/wae-db/lib/flume-file-channel-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-configuration-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-core-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-node-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-sdk-1.5.2.jar:/opt/cariden/software/wae-db/lib/guava-16.0.jar:/opt/cariden/software/wae-db/lib/high-scale-lib-1.0.6.jar:/opt/cariden/software/wae-db/lib/jackson-core-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jackson-mapper-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar:/opt/cariden/software/wae-db/lib/javax.inject.jar:/opt/cariden/software/wae-db/lib/jbcrypt-0.3m.jar:/opt/cariden/software/wae-db/lib/jline-1.0.jar:/opt/cariden/software/wae-db/lib/jna-4.0.0.jar:/opt/cariden/software/wae-db/lib/json-simple-1.1.jar:/opt/cariden/software/wae-db/lib/libthrift-0.9.1.jar:/opt/cariden/software/wae-db/lib/logback-classic-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-core-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-flume-1.0.0-non-osgi.jar:/opt/cariden/software/wae-db/lib/lz4-1.2.0.jar:/opt/cariden/software/wae-db/lib/metrics-core-2.2.0.jar:/opt/cariden/software/wae-db/lib/netty-3.4.0.Final.jar:/opt/cariden/software/wae-db/lib/netty-all-4.0.23.Final.jar:/opt/cariden/software/wae-db/lib/reporter-config-2.1.0.jar:/opt/cariden/software/wae-db/lib/slf4j-api-1.7.2.jar:/opt/cariden/software/wae-db/lib/snakeyaml-1.11.jar:/opt/cariden/software/wae-db/lib/snappy-java-1.0.5.2.jar:/opt/cariden/software/wae-db/lib/stream-2.5.2.jar:/opt/cariden/software/wae-db/lib/stringtemplate-4.0.2.jar:/opt/cariden/software/wae-db/lib/super-csv-2.1.0.jar:/opt/cariden/software/wae-db/lib/thrift-server-0.3.7.jar org.apache.cassandra.service.CassandraDaemon wae 6283 16138 0 Jan18 ? 00:00:00 [wae-db] root 17980 4403 0 00:49 pts/5 00:00:00 grep wae-db [[email protected] ~]# cd /opt/cariden/software/wae-db [[email protected] wae-db]# ll total 336 drwxrwxr-x. 2 wae wae 4096 Mar 28 2017 bin -rw-------. 1 wae wae 5 Jan 19 00:53 cassandra.pid -rw-r--r--. 1 wae wae 225860 Oct 22 2014 CHANGES.txt drwxrwxr-x. 3 wae wae 4096 Mar 28 2017 conf drwxrwxr-x. 2 wae wae 4096 Mar 28 2017 interface drwxrwxr-x. 4 wae wae 4096 Mar 28 2017 javadoc drwxrwxr-x. 3 wae wae 4096 Mar 28 2017 lib -rw-r--r--. 1 wae wae 11609 Oct 22 2014 LICENSE.txt -rw-r--r--. 1 wae wae 63584 Oct 22 2014 NEWS.txt -rw-r--r--. 1 wae wae 2117 Oct 22 2014 NOTICE.txt drwxrwxr-x. 3 wae wae 4096 Mar 28 2017 pylib drwxrwxr-x. 4 wae wae 4096 Mar 28 2017 tools [[email protected] wae-db]# cat cassandra.pid 27757 [[email protected] wae-db]# /etc/init.d/wae-db stop Shutting down wae-db: [ OK ] [[email protected] wae-db]# ps -ef | grep wae-db wae 5633 1 0 Jan17 ? 00:12:59 /usr/java/latest/bin/java -ea -javaagent:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1G -Xmx1G -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:CompileCommandFile=/opt/cariden/software/wae-db/conf/hotspot_compiler -XX:CMSWaitDuration=10000 -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/opt/cariden/software/wae-db/logs -Dcassandra.storagedir=/opt/cariden/software/wae-db/data -Dcassandra-pidfile=/opt/cariden/software/wae-db/cassandra.pid -cp /opt/cariden/software/wae-db/conf:/opt/cariden/software/wae-db/build/classes/main:/opt/cariden/software/wae-db/build/classes/thrift:/opt/cariden/software/wae-db/lib/airline-0.6.jar:/opt/cariden/software/wae-db/lib/antlr-runtime-3.5.2.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-clientutil-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-thrift-2.1.1.jar:/opt/cariden/software/wae-db/lib/avro-1.7.7.jar:/opt/cariden/software/wae-db/lib/avro-ipc-1.7.7.jar:/opt/cariden/software/wae-db/lib/commons-cli-1.1.jar:/opt/cariden/software/wae-db/lib/commons-codec-1.2.jar:/opt/cariden/software/wae-db/lib/commons-lang3-3.1.jar:/opt/cariden/software/wae-db/lib/commons-math3-3.2.jar:/opt/cariden/software/wae-db/lib/compress-lzf-0.8.4.jar:/opt/cariden/software/wae-db/lib/concurrentlinkedhashmap-lru-1.4.jar:/opt/cariden/software/wae-db/lib/disruptor-3.0.1.jar:/opt/cariden/software/wae-db/lib/flume-file-channel-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-configuration-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-core-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-node-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-sdk-1.5.2.jar:/opt/cariden/software/wae-db/lib/guava-16.0.jar:/opt/cariden/software/wae-db/lib/high-scale-lib-1.0.6.jar:/opt/cariden/software/wae-db/lib/jackson-core-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jackson-mapper-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar:/opt/cariden/software/wae-db/lib/javax.inject.jar:/opt/cariden/software/wae-db/lib/jbcrypt-0.3m.jar:/opt/cariden/software/wae-db/lib/jline-1.0.jar:/opt/cariden/software/wae-db/lib/jna-4.0.0.jar:/opt/cariden/software/wae-db/lib/json-simple-1.1.jar:/opt/cariden/software/wae-db/lib/libthrift-0.9.1.jar:/opt/cariden/software/wae-db/lib/logback-classic-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-core-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-flume-1.0.0-non-osgi.jar:/opt/cariden/software/wae-db/lib/lz4-1.2.0.jar:/opt/cariden/software/wae-db/lib/metrics-core-2.2.0.jar:/opt/cariden/software/wae-db/lib/netty-3.4.0.Final.jar:/opt/cariden/software/wae-db/lib/netty-all-4.0.23.Final.jar:/opt/cariden/software/wae-db/lib/reporter-config-2.1.0.jar:/opt/cariden/software/wae-db/lib/slf4j-api-1.7.2.jar:/opt/cariden/software/wae-db/lib/snakeyaml-1.11.jar:/opt/cariden/software/wae-db/lib/snappy-java-1.0.5.2.jar:/opt/cariden/software/wae-db/lib/stream-2.5.2.jar:/opt/cariden/software/wae-db/lib/stringtemplate-4.0.2.jar:/opt/cariden/software/wae-db/lib/super-csv-2.1.0.jar:/opt/cariden/software/wae-db/lib/thrift-server-0.3.7.jar org.apache.cassandra.service.CassandraDaemon wae 6283 16138 0 Jan18 ? 00:00:00 [wae-db] root 32425 4403 0 00:56 pts/5 00:00:00 grep wae-db [[email protected] wae-db]# kill 5633 [[email protected] wae-db]# ps -ef | grep wae-db wae 6283 16138 0 Jan18 ? 00:00:00 [wae-db] root 32451 4403 0 00:57 pts/5 00:00:00 grep wae-db [[email protected] wae-db]# service wae-db status wae-db is not running [[email protected] wae-db]# /etc/init.d/wae-db start Starting wae-db: [ OK ] [[email protected] wae-db]# service wae-db status wae-db is running OK, with PID=32634
版权声明:
本文链接:CISCO WAE: wae-db dead but pid file exists
版权声明:本文为原创文章,仅代表个人观点,版权归 Frank Zhao 所有,转载时请注明本文出处及文章链接