"maprcli service list" Has Incorrect Output for Service State

Document created by mufeed Employee on Feb 7, 2016
Version 1Show Document
  • View in full screen mode

Author: Mufeed Usman

 

Original Publication Date: July 14, 2015

Environment:

All MapR versions.

 

Symptom:

Configured service(s) on the node that are not intended to be active-standby for HA continue to show state '5' indicating 'standby' and do not get into running state.

 

The following services are configured on the 'problem' node.

hivemeta     5     /tmp/mapr HiveMetastore 
nfs          5     /opt/mapr/logs/nfsserver.log NFS Gateway
hoststats     5     /opt/mapr/logs/hoststats.log HostStats
hs2          5     /tmp/mapr HiveServer2

 

As seen above all services are in stand-by state as indicated by state '5'. In this scenario querying for the znode will show the following,

 

On the ZooKeeper, it can see the node as registered.

[zk: g4t7541.com:5181(CONNECTED) 0] ls /servers 
[<SNIPPED>..., problem-node.com, ...<SNIPPED>]

 

But when the services znode info is queried the 'problem' node is not listed for the above mentioned services (the reason why maprcli command is showing the wrong status).

 

[zk: g4t7541.com:5181(CONNECTED) 1] ls /services/hivemeta

[g4t7523.com, master]

 

[zk: g4t7541.com:5181(CONNECTED) 2] ls /services/nfs

[g4t7557.com, g4t7544.com, g4t7560.com,

g4t7464.com, g4t7558.com, g4t7480.com,

g4t7545.com, g4t7465.com, g4t7466.com,

g4t7523.com, g4t7561.com, g4t7496.com,

g4t7497.com, g4t7541.com, g4t7513.com,

g4t7559.com, g4t7527.com, master, g4t7543.com,

g4t7556.com, g4t7539.com, g4t7528.com,

g4t7529.com, g4t7542.com, g4t7482.com,

g4t7481.com, g4t7512.com]

 

[zk: g4t7541.com:5181(CONNECTED) 3] ls /services/hoststats

[g4t7557.com, g4t7544.com, g4t7560.com,

g4t7464.com, g4t7558.com, g4t7480.com,

g4t7545.com, g4t7465.com, g4t7466.com,

g4t7523.com, g4t7561.com, g4t7496.com,

g4t7497.com, g4t7541.com, g4t7513.com,

g4t7559.com, g4t7527.com, master, g4t7543.com,

g4t7556.com, g4t7539.com, g4t7528.com,

g4t7529.com, g4t7542.com, g4t7482.com,

g4t7481.com, g4t7512.com]

 

[zk: g4t7541.com:5181(CONNECTED) 4] ls /services/hs2

[g4t7523.com, master]

The Warden log will show that after the initialization of Warden it never starts any services, warden just parses the configuration. Something similar to the below will be seen in the warden log under /opt/mapr/logs/. The following was observed in /opt/mapr/logs/warden.log:

2015-07-04 21:15:11,778 INFO  com.mapr.warden.WardenManager [main]: Configured services: [hs2:1, hivemeta:1, hs2:1, nfs:all:cldb, hoststats:all:nfs]

 

Note that there are two entries for the service 'hs2'.

 

Root Cause:

This is because there is a duplicate entry for this service under /opt/mapr/conf/conf.d/. 

# ls /opt/mapr/conf/conf.d/ warden.hivemeta.conf  
warden.hs2.conf  warden.hs3.conf

 

While the name of the service is distinct in the name of the files, warden.hs2.conf and warden.hs3.conf, they use the same underlying service name.

# grep services /opt/mapr/conf/conf.d/warden.hs*.conf 
/opt/mapr/conf/conf.d/warden.hs2.conf:services=hs2:1
/opt/mapr/conf/conf.d/warden.hs3.conf:services=hs2:1

 

Solution:

The correct configuration is to have the 'services' line in warden.hs3.conf reflect the name of the service as 'hs3' rather than 'hs2'.  For example,

services=hs3:1

 

Correcting the duplicate service definition and restarting Warden start things up as expected.

1 person found this helpful

Attachments

    Outcomes