AnsweredAssumed Answered

Couldn't connect to the CLDB service and MCS

Question asked by Saha_Chupong on Jun 27, 2018
Latest reply on Jun 28, 2018 by Saha_Chupong

Hi,

 

I'm new in MAPR. This is a MAPR5.22 single node on RedHat 7.3

First we got error on FAILED: RuntimeException Error while making MR scratch directory.

But now I could not access MCS after I try to stop all services and restart OS.

 

Any advice will be truly appreciate.

 

When I try 

# maprcli node services -nodes dwmapr -webserver restart
ERROR (10006) - Unable to obtain the ZooKeeper connection string from the CLDB. Make sure that the CLDB is running and accessible.

# maprcli node cldbmaster

ERROR (10009) - Couldn't connect to the CLDB service

# service mapr-cldb status
2.7.0
/opt/mapr/pid/cldb.pid exists with pid 21142 but no CLDB.

 

The logs as I could find.

/opt/mapr/logs/cldb.log

2018-06-26 17:33:03,022 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services/nfs. Event state: SyncConnected. Event type: NodeChildrenChanged
2018-06-26 17:33:03,023 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services/nfs/master. Event state: SyncConnected. Event type: NodeCreated
2018-06-26 17:33:03,023 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services/nfs. Event state: SyncConnected. Event type: NodeChildrenChanged
2018-06-26 17:33:03,025 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services_config/nfs/dwmapr. Event state: SyncConnected. Event type: NodeDataChanged
2018-06-26 17:33:03,026 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services_config/nfs/dwmapr. Event state: SyncConnected. Event type: NodeDataChanged
2018-06-26 17:33:04,408 WARN Alarms [RPC-8]: NODE_ALARM_SERVICE_NFS_DOWN:dwmapr:NODE_ALARM cleared,
2018-06-26 17:33:05,179 INFO FileServerHandler [RPC-6]: NFSRegister: Request FSID: 2205848235922068170 mode : server isDCA: false NFSHost:Port: 10.0.0.198:2049-192.168.122.1:2049- NFSHostName: dwmapr isLoopbackNFS: false $
2018-06-26 17:33:05,179 INFO LicenseManager [RPC-6]: No license to run NFS server in servermode
2018-06-26 17:33:05,179 WARN NFSHandler [RPC-6]: NFS server registration denied (requesting it to shutdown): dwmapr FSID: 2205848235922068170: No license to run NFS server in servermode
2018-06-26 17:33:15,007 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services/nfs/dwmapr. Event state: SyncConnected. Event type: NodeDataChanged
2018-06-26 17:33:15,008 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services/nfs/master. Event state: SyncConnected. Event type: NodeDataChanged
2018-06-26 17:33:15,442 INFO FileServerHandler [RPC-2]: NFSRegister: Request FSID: 2205848235922068170 mode : server isDCA: false NFSHost:Port: 10.0.0.198:2049-192.168.122.1:2049- NFSHostName: dwmapr isLoopbackNFS: false $
2018-06-26 17:33:15,442 INFO LicenseManager [RPC-2]: No license to run NFS server in servermode
2018-06-26 17:33:25,640 INFO FileServerHandler [RPC-4]: NFSRegister: Request FSID: 2205848235922068170 mode : server isDCA: false NFSHost:Port: 10.0.0.198:2049-192.168.122.1:2049- NFSHostName: dwmapr isLoopbackNFS: false $
2018-06-26 17:33:25,640 INFO LicenseManager [RPC-4]: No license to run NFS server in servermode
2018-06-26 17:33:35,677 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services/nfs/master. Event state: SyncConnected. Event type: NodeDeleted
2018-06-26 17:33:35,677 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services/nfs. Event state: SyncConnected. Event type: NodeChildrenChanged
2018-06-26 17:33:35,678 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services/nfs/dwmapr. Event state: SyncConnected. Event type: NodeDeleted
2018-06-26 17:33:35,679 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services_config/nfs/dwmapr. Event state: SyncConnected. Event type: NodeDataChanged
2018-06-26 17:33:37,019 WARN Alarms [RPC-7]: Alarm raised: NODE_ALARM_SERVICE_NFS_DOWN:dwmapr:NODE_ALARM; Cluster: dwmaprdev.exim.go.th; Can not determine if service: nfs is running. Check logs at: /opt/mapr/logs/nfsserver.$
2018-06-26 17:33:56,006 INFO ReplicationHandlerThread [Repl]: <PRIORITY_REPLICATION> P=216; F=216; QS=27;
2018-06-26 17:33:56,006 INFO ReplicationHandlerThread [Repl]: <UNDER_REPLICATION> P=64; F=64; QS=8;
2018-06-26 17:35:56,020 INFO ReplicationHandlerThread [Repl]: <PRIORITY_REPLICATION> P=216; F=216; QS=27;
2018-06-26 17:35:56,020 INFO ReplicationHandlerThread [Repl]: <UNDER_REPLICATION> P=64; F=64; QS=8;
2018-06-26 17:37:56,035 INFO ReplicationHandlerThread [Repl]: <PRIORITY_REPLICATION> P=216; F=216; QS=27;
2018-06-26 17:37:56,035 INFO ReplicationHandlerThread [Repl]: <UNDER_REPLICATION> P=64; F=64; QS=8;
2018-06-26 17:39:56,054 INFO ReplicationHandlerThread [Repl]: <PRIORITY_REPLICATION> P=216; F=216; QS=27;
2018-06-26 17:39:56,054 INFO ReplicationHandlerThread [Repl]: <UNDER_REPLICATION> P=64; F=64; QS=8;
2018-06-26 17:41:56,069 INFO ReplicationHandlerThread [Repl]: <PRIORITY_REPLICATION> P=216; F=216; QS=27;
2018-06-26 17:41:56,069 INFO ReplicationHandlerThread [Repl]: <UNDER_REPLICATION> P=64; F=64; QS=8;
2018-06-26 17:43:56,084 INFO ReplicationHandlerThread [Repl]: <PRIORITY_REPLICATION> P=216; F=216; QS=27;
2018-06-26 17:43:56,084 INFO ReplicationHandlerThread [Repl]: <UNDER_REPLICATION> P=64; F=64; QS=8;
2018-06-26 17:45:56,098 INFO ReplicationHandlerThread [Repl]: <PRIORITY_REPLICATION> P=216; F=216; QS=27;
2018-06-26 17:45:56,098 INFO ReplicationHandlerThread [Repl]: <UNDER_REPLICATION> P=64; F=64; QS=8;
2018-06-26 17:47:56,116 INFO ReplicationHandlerThread [Repl]: <PRIORITY_REPLICATION> P=216; F=216; QS=27;
2018-06-26 17:47:56,116 INFO ReplicationHandlerThread [Repl]: <UNDER_REPLICATION> P=64; F=64; QS=8;
2018-06-26 17:49:56,131 INFO ReplicationHandlerThread [Repl]: <PRIORITY_REPLICATION> P=216; F=216; QS=27;
2018-06-26 17:49:56,131 INFO ReplicationHandlerThread [Repl]: <UNDER_REPLICATION> P=64; F=64; QS=8;
2018-06-26 17:51:56,146 INFO ReplicationHandlerThread [Repl]: <PRIORITY_REPLICATION> P=216; F=216; QS=27;
2018-06-26 17:51:56,146 INFO ReplicationHandlerThread [Repl]: <UNDER_REPLICATION> P=64; F=64; QS=8;
2018-06-26 17:53:06,625 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services_config/nodemanager/dwmapr. Event state: SyncConnected. Event type: NodeDataChanged
2018-06-26 17:53:06,629 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services_config/resourcemanager/dwmapr. Event state: SyncConnected. Event type: NodeDataChanged
2018-06-26 17:53:06,630 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services_config/historyserver/dwmapr. Event state: SyncConnected. Event type: NodeDataChanged
2018-06-26 17:53:06,633 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services_config/cldb/dwmapr. Event state: SyncConnected. Event type: NodeDataChanged
2018-06-26 17:53:06,642 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /services_config/nfs/dwmapr. Event state: SyncConnected. Event type: NodeDataChanged
2018-06-26 17:53:06,735 INFO CLDB [Thread-13]: CLDB ShutDown Hook called
2018-06-26 17:53:06,735 INFO ZooKeeperClient [Thread-13]: Setting the clean cldbshutdown flag to true
2018-06-26 17:53:06,736 INFO ZooKeeperClient [Thread-13]: Zookeeper Client: Closing client connection:
2018-06-26 17:53:06,739 INFO ZKDataRetrieval [Thread-2-EventThread]: Process path: /datacenter/controlnodes/cldb/active/CLDBMaster. Event state: SyncConnected. Event type: NodeDeleted
2018-06-26 17:53:06,740 INFO CLDBServer [main-EventThread]: The CLDB received notification that a ZooKeeper event of type NodeDeleted occurred on path /datacenter/controlnodes/cldb/active/CLDBMaster
2018-06-26 17:53:06,740 FATAL CLDB [main-EventThread]: CLDBShutdown: This CLDB will shutdown now because it was holding the master CLDB lock and received notification from the ZooKeeper ensemble that the lock was deleted
2018-06-26 17:53:06,742 INFO ZooKeeper [Thread-13]: Session: 0x164362bd2230014 closed
2018-06-26 17:53:06,742 INFO CLDB [Thread-13]: CLDB shutdown

 

/opt/mapr/logs/maprcli-mapr-5000.log

Header: hostName: dwmapr, Time Zone: Indochina Time, processName: MapRCLI, processId: 31493, MapR Build Version: 5.2.2.44680.GA
2018-06-27 15:59:03,924 INFO com.mapr.cliframework.driver.CLIMainDriver [main]: [node, services, -cldb, start]
2018-06-27 15:59:03,946 ERROR com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils getDataForParticularCLDB [main]: No data returned in RPC: CLDB Ips: 127.0.0.1-, Port: 7222. Continue searching for correct CLDB
2018-06-27 15:59:03,948 INFO com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils [main]: Bad CLDB credentials removed: CLDB Ips: 127.0.0.1-, Port: 7222
2018-06-27 15:59:03,948 INFO com.mapr.cli.common.AuthManager [main]: Authorization Not Supported at CLDB: Reverting to Obtaining ACLs
2018-06-27 15:59:04,014 INFO com.mapr.cliframework.driver.CLIMainDriver [main]: ERROR (10009) - RPC Rejected by CLDB Server

 

/opt/mapr/logs/maprcli-root-0.log

Header: hostName: dwmapr, Time Zone: Indochina Time, processName: MapRCLI, processId: 29198, MapR Build Version: 5.2.2.44680.GA
2018-06-28 08:29:41,315 INFO com.mapr.cliframework.driver.CLIMainDriver [main]: [node, cldbmaster]
2018-06-28 08:29:41,338 ERROR com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils getDataForParticularCLDB [main]: No data returned in RPC: CLDB Ips: 10.0.0.198-, Port: 7222. Continue searching for correct CLDB
2018-06-28 08:29:41,340 INFO com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils [main]: Bad CLDB credentials removed: CLDB Ips: 10.0.0.198-, Port: 7222
2018-06-28 08:29:41,340 ERROR com.mapr.cli.common.PluggableAlarmUtil getAlarms [main]: Exception: Failed to connect to CLDB
2018-06-28 08:29:41,342 ERROR com.mapr.cli.common.PluggableAlarmUtil getAlarmsInternally [main]: Failed due to exception
2018-06-28 08:29:41,343 ERROR com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils getDataForParticularCLDB [main]: No data returned in RPC: CLDB Ips: 10.0.0.198-, Port: 7222. Continue searching for correct CLDB
2018-06-28 08:29:41,343 INFO com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils [main]: Bad CLDB credentials removed: CLDB Ips: 10.0.0.198-, Port: 7222
2018-06-28 08:29:41,343 ERROR com.mapr.cli.common.PluggableAlarmUtil getAlarms [main]: Exception: Failed to connect to CLDB
2018-06-28 08:29:41,343 ERROR com.mapr.cli.common.PluggableAlarmUtil getAlarmsInternally [main]: Failed due to exception
2018-06-28 08:29:41,346 ERROR com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils getDataForParticularCLDB [main]: No data returned in RPC: CLDB Ips: 10.0.0.198-, Port: 7222. Continue searching for correct CLDB
2018-06-28 08:29:41,347 INFO com.mapr.baseutils.cldbutils.CLDBRpcCommonUtils [main]: Bad CLDB credentials removed: CLDB Ips: 10.0.0.198-, Port: 7222
2018-06-28 08:29:41,403 INFO com.mapr.cliframework.driver.CLIMainDriver [main]: ERROR (10009) - Couldn't connect to the CLDB service

 

Thanks

Chupong

Outcomes