CISCO WAE: wae-db dead but pid file exists
Introduction
If you found similar alarms, you can follow the guide and try to workaround the issue.
Problem
My customer found WAE REST API not workable, after checked, found wae-db have issue and report alarm, as follow.
And try to restart the WAE service and reload VM, not any useful.
[root@wae-auto wae-db]# service --status-all | grep wae
JAVA_EXECUTABLE or HSQLDB_JAR_PATH in '/etc/sysconfig/hsqldb' is set to a non-file.
wae-appenginecore is running OK, with PID=18606
wae-core is running OK, with PID=18163
wae-db dead but pid file exists <<<<<<
wae-designapiserver is running OK, with PID=18847
wae-messaging is running OK, with PID=17874
wae-ni is running OK, with PID=17525
wae-osc is running OK, with PID=18369
wae-svcs-dashui is running OK, with PID=16652
wae-svcs-db is running OK, with PID=16277
wae-svcs-localrepo is not running
wae-svcs-log is running OK, with PID=16497
wae-svcs-logagent is running OK, with PID=16570
wae-svcs-metricsbkr is running OK, with PID=16716
wae-svcs-metricsd is running OK, with PID=16806
wae-svcs-mon is running OK, with PID=16138
wae-svcs-ui is running OK, with PID=16917
wae-system-server is running OK, with PID=16186
wae-web-server is running OK, with PID=17761
Solution
After interactive with BU, resolved the issue.
Wae-db is the Cassandra database that wae is using for internal purposes. It’s different from mld. Based on the error message wae-db process stopped working for some reason but the wae-db service still is ruuning.
wae-db dead but pid file exists
Anything you ran in Linux as a service, Linux will create a pid file with the pid of that process, it indicates Linux that the service is running so you cannot run another instance of the same service. Linux should delete that file when service is stopped, sometime after some failures the service may stop/crash but Linux may not know and won’t delete the pid file. So have the issue, if you google the issue, you will found more process have similar alarms.
In order to recovery the similar issue, we need to do follow action:
- kill the process
- delete the pid file
- restart the service
The pid file of wae-db is located in the /opt/cariden/software/wae-db folder and it’s called Cassandra.pid.
[root@wae-auto wae-db]# ll
total 336
drwxrwxr-x. 2 wae wae 4096 Mar 28 2017 bin
-rw-------. 1 wae wae 5 Jan 19 00:53 cassandra.pid
Workaround example:
[root@wae-auto ~]# service wae-db status wae-db dead but pid file exists [root@wae-auto ~]# ps -ef | grep wae-db wae 5633 1 0 Jan17 ? 00:12:57 /usr/java/latest/bin/java -ea -javaagent:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1G -Xmx1G -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:CompileCommandFile=/opt/cariden/software/wae-db/conf/hotspot_compiler -XX:CMSWaitDuration=10000 -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/opt/cariden/software/wae-db/logs -Dcassandra.storagedir=/opt/cariden/software/wae-db/data -Dcassandra-pidfile=/opt/cariden/software/wae-db/cassandra.pid -cp /opt/cariden/software/wae-db/conf:/opt/cariden/software/wae-db/build/classes/main:/opt/cariden/software/wae-db/build/classes/thrift:/opt/cariden/software/wae-db/lib/airline-0.6.jar:/opt/cariden/software/wae-db/lib/antlr-runtime-3.5.2.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-clientutil-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-thrift-2.1.1.jar:/opt/cariden/software/wae-db/lib/avro-1.7.7.jar:/opt/cariden/software/wae-db/lib/avro-ipc-1.7.7.jar:/opt/cariden/software/wae-db/lib/commons-cli-1.1.jar:/opt/cariden/software/wae-db/lib/commons-codec-1.2.jar:/opt/cariden/software/wae-db/lib/commons-lang3-3.1.jar:/opt/cariden/software/wae-db/lib/commons-math3-3.2.jar:/opt/cariden/software/wae-db/lib/compress-lzf-0.8.4.jar:/opt/cariden/software/wae-db/lib/concurrentlinkedhashmap-lru-1.4.jar:/opt/cariden/software/wae-db/lib/disruptor-3.0.1.jar:/opt/cariden/software/wae-db/lib/flume-file-channel-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-configuration-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-core-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-node-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-sdk-1.5.2.jar:/opt/cariden/software/wae-db/lib/guava-16.0.jar:/opt/cariden/software/wae-db/lib/high-scale-lib-1.0.6.jar:/opt/cariden/software/wae-db/lib/jackson-core-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jackson-mapper-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar:/opt/cariden/software/wae-db/lib/javax.inject.jar:/opt/cariden/software/wae-db/lib/jbcrypt-0.3m.jar:/opt/cariden/software/wae-db/lib/jline-1.0.jar:/opt/cariden/software/wae-db/lib/jna-4.0.0.jar:/opt/cariden/software/wae-db/lib/json-simple-1.1.jar:/opt/cariden/software/wae-db/lib/libthrift-0.9.1.jar:/opt/cariden/software/wae-db/lib/logback-classic-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-core-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-flume-1.0.0-non-osgi.jar:/opt/cariden/software/wae-db/lib/lz4-1.2.0.jar:/opt/cariden/software/wae-db/lib/metrics-core-2.2.0.jar:/opt/cariden/software/wae-db/lib/netty-3.4.0.Final.jar:/opt/cariden/software/wae-db/lib/netty-all-4.0.23.Final.jar:/opt/cariden/software/wae-db/lib/reporter-config-2.1.0.jar:/opt/cariden/software/wae-db/lib/slf4j-api-1.7.2.jar:/opt/cariden/software/wae-db/lib/snakeyaml-1.11.jar:/opt/cariden/software/wae-db/lib/snappy-java-1.0.5.2.jar:/opt/cariden/software/wae-db/lib/stream-2.5.2.jar:/opt/cariden/software/wae-db/lib/stringtemplate-4.0.2.jar:/opt/cariden/software/wae-db/lib/super-csv-2.1.0.jar:/opt/cariden/software/wae-db/lib/thrift-server-0.3.7.jar org.apache.cassandra.service.CassandraDaemon wae 6283 16138 0 Jan18 ? 00:00:00 [wae-db] root 17980 4403 0 00:49 pts/5 00:00:00 grep wae-db [root@wae-auto ~]# cd /opt/cariden/software/wae-db [root@wae-auto wae-db]# ll total 336 drwxrwxr-x. 2 wae wae 4096 Mar 28 2017 bin -rw-------. 1 wae wae 5 Jan 19 00:53 cassandra.pid -rw-r--r--. 1 wae wae 225860 Oct 22 2014 CHANGES.txt drwxrwxr-x. 3 wae wae 4096 Mar 28 2017 conf drwxrwxr-x. 2 wae wae 4096 Mar 28 2017 interface drwxrwxr-x. 4 wae wae 4096 Mar 28 2017 javadoc drwxrwxr-x. 3 wae wae 4096 Mar 28 2017 lib -rw-r--r--. 1 wae wae 11609 Oct 22 2014 LICENSE.txt -rw-r--r--. 1 wae wae 63584 Oct 22 2014 NEWS.txt -rw-r--r--. 1 wae wae 2117 Oct 22 2014 NOTICE.txt drwxrwxr-x. 3 wae wae 4096 Mar 28 2017 pylib drwxrwxr-x. 4 wae wae 4096 Mar 28 2017 tools [root@wae-auto wae-db]# cat cassandra.pid 27757 [root@wae-auto wae-db]# /etc/init.d/wae-db stop Shutting down wae-db: [ OK ] [root@wae-auto wae-db]# ps -ef | grep wae-db wae 5633 1 0 Jan17 ? 00:12:59 /usr/java/latest/bin/java -ea -javaagent:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1G -Xmx1G -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:CompileCommandFile=/opt/cariden/software/wae-db/conf/hotspot_compiler -XX:CMSWaitDuration=10000 -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.rmi.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/opt/cariden/software/wae-db/logs -Dcassandra.storagedir=/opt/cariden/software/wae-db/data -Dcassandra-pidfile=/opt/cariden/software/wae-db/cassandra.pid -cp /opt/cariden/software/wae-db/conf:/opt/cariden/software/wae-db/build/classes/main:/opt/cariden/software/wae-db/build/classes/thrift:/opt/cariden/software/wae-db/lib/airline-0.6.jar:/opt/cariden/software/wae-db/lib/antlr-runtime-3.5.2.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-clientutil-2.1.1.jar:/opt/cariden/software/wae-db/lib/apache-cassandra-thrift-2.1.1.jar:/opt/cariden/software/wae-db/lib/avro-1.7.7.jar:/opt/cariden/software/wae-db/lib/avro-ipc-1.7.7.jar:/opt/cariden/software/wae-db/lib/commons-cli-1.1.jar:/opt/cariden/software/wae-db/lib/commons-codec-1.2.jar:/opt/cariden/software/wae-db/lib/commons-lang3-3.1.jar:/opt/cariden/software/wae-db/lib/commons-math3-3.2.jar:/opt/cariden/software/wae-db/lib/compress-lzf-0.8.4.jar:/opt/cariden/software/wae-db/lib/concurrentlinkedhashmap-lru-1.4.jar:/opt/cariden/software/wae-db/lib/disruptor-3.0.1.jar:/opt/cariden/software/wae-db/lib/flume-file-channel-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-configuration-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-core-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-node-1.5.2.jar:/opt/cariden/software/wae-db/lib/flume-ng-sdk-1.5.2.jar:/opt/cariden/software/wae-db/lib/guava-16.0.jar:/opt/cariden/software/wae-db/lib/high-scale-lib-1.0.6.jar:/opt/cariden/software/wae-db/lib/jackson-core-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jackson-mapper-asl-1.9.2.jar:/opt/cariden/software/wae-db/lib/jamm-0.2.6.jar:/opt/cariden/software/wae-db/lib/javax.inject.jar:/opt/cariden/software/wae-db/lib/jbcrypt-0.3m.jar:/opt/cariden/software/wae-db/lib/jline-1.0.jar:/opt/cariden/software/wae-db/lib/jna-4.0.0.jar:/opt/cariden/software/wae-db/lib/json-simple-1.1.jar:/opt/cariden/software/wae-db/lib/libthrift-0.9.1.jar:/opt/cariden/software/wae-db/lib/logback-classic-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-core-1.1.2.jar:/opt/cariden/software/wae-db/lib/logback-flume-1.0.0-non-osgi.jar:/opt/cariden/software/wae-db/lib/lz4-1.2.0.jar:/opt/cariden/software/wae-db/lib/metrics-core-2.2.0.jar:/opt/cariden/software/wae-db/lib/netty-3.4.0.Final.jar:/opt/cariden/software/wae-db/lib/netty-all-4.0.23.Final.jar:/opt/cariden/software/wae-db/lib/reporter-config-2.1.0.jar:/opt/cariden/software/wae-db/lib/slf4j-api-1.7.2.jar:/opt/cariden/software/wae-db/lib/snakeyaml-1.11.jar:/opt/cariden/software/wae-db/lib/snappy-java-1.0.5.2.jar:/opt/cariden/software/wae-db/lib/stream-2.5.2.jar:/opt/cariden/software/wae-db/lib/stringtemplate-4.0.2.jar:/opt/cariden/software/wae-db/lib/super-csv-2.1.0.jar:/opt/cariden/software/wae-db/lib/thrift-server-0.3.7.jar org.apache.cassandra.service.CassandraDaemon wae 6283 16138 0 Jan18 ? 00:00:00 [wae-db] root 32425 4403 0 00:56 pts/5 00:00:00 grep wae-db [root@wae-auto wae-db]# kill 5633 [root@wae-auto wae-db]# ps -ef | grep wae-db wae 6283 16138 0 Jan18 ? 00:00:00 [wae-db] root 32451 4403 0 00:57 pts/5 00:00:00 grep wae-db [root@wae-auto wae-db]# service wae-db status wae-db is not running [root@wae-auto wae-db]# /etc/init.d/wae-db start Starting wae-db: [ OK ] [root@wae-auto wae-db]# service wae-db status wae-db is running OK, with PID=32634
版权声明:
本文链接:CISCO WAE: wae-db dead but pid file exists
版权声明:本文为原创文章,仅代表个人观点,版权归 Frank Zhao 所有,转载时请注明本文出处及文章链接