AnsweredAssumed Answered

No CLDB entries, cannot run Error

Question asked by sjgx on Mar 2, 2017
Latest reply on Apr 18, 2017 by sjgx

I'm getting the following error when I try to do anything  on my cluster:

> hadoop fs -ls /localdata
>>> 2017-03-01 15:20:31,1332 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:2191 Thread: 13874 MoveToNextCldb: No CLDB entries, cannot run, sleeping 5 seconds!

 

I'm not sure what happened, it was running fine yesterday and I'm the only person using the cluster. I also can't log into the web dashboard. I tried running through the steps in this post:

 

> sudo service mapr-warden stop
>>> stopping WARDEN
>>> looking to stop mapr-core processes not started by warden
>  sudo service mapr-zookeeper stop
>>> JMX enabled by default
>>> Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/conf/zoo.cfg
>>> Stopping zookeeper ... STOPPED
>  ps -fu mapr
>>> UID PID PPID C STIME TTY TIME CMD
>  sudo service mapr-zookeeper start
>>> JMX enabled by default
>>> Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/conf/zoo.cfg
>>> Starting zookeeper ... STARTED
>  sudo service mapr-zookeeper qstatus
>>> JMX enabled by default
>>> Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/conf/zoo.cfg
>>> Mode: standalone
> sudo maprcli node cldbmaster
>>> ERROR (10009) - Couldn't connect to the CLDB service

 

I'm not sure why I'm getting standalone as this is a 10 node cluster. I'm running Ubuntu 14.04 and Hadoop 2.7.0-mapr-1607. Here is the output from the two logs mentioned in the linked post

 

/opt/mapr/logs/warden.log:

2017-03-02 09:54:05,412 INFO com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [Timer-0]: [e_SERV_CONF, hostName, ma_host, ma_process]
2017-03-02 09:54:05,412 INFO com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [Timer-0]: []
2017-03-02 09:58:35,362 INFO com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: Launching a separate process to execute /opt/mapr/server/pullcentralconfig
2017-03-02 10:01:32,344 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [cldb_monitor]: Monitor command: [/opt/mapr/initscripts/mapr-cldb, status]can not determine if service: cldb is running. Retrying. Retrial #1. Total retries count is: 3
2017-03-02 10:01:32,348 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [cldb_monitor]: /opt/mapr/pid/cldb.pid exists with pid 5553 but no CLDB.
2017-03-02 10:01:32,349 INFO com.mapr.warden.service.baseservice.Service$ServiceRun [cldb_monitor]: Command: [/opt/mapr/initscripts/mapr-cldb, start], Directory: /opt/mapr/initscripts
2017-03-02 10:01:33,648 INFO com.mapr.warden.service.baseservice.Service$ServiceRun [cldb_monitor]: /opt/mapr/initscripts/mapr-cldb: line 149: ulimit: max user processes: cannot modify limit: Operation not permitted
Starting CLDB, logging to /opt/mapr/logs/cldb.log
2017-03-02 10:01:43,649 INFO com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [cldb_monitor]: [e_SERV_RUN, hostName, ma_host, ma_process]
2017-03-02 10:01:43,649 INFO com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [cldb_monitor]: []
2017-03-02 10:05:44,118 ERROR com.mapr.job.mngmnt.hadoop.metrics.MaprRPCContext run [Thread-5]: Response is null. Most likely hoststats is not accepting requests or it is down.
2017-03-02 10:09:15,574 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [cldb_monitor]: Monitor command: [/opt/mapr/initscripts/mapr-cldb, status]can not determine if service: cldb is running. Retrying. Retrial #2. Total retries count is: 3
2017-03-02 10:09:15,574 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [cldb_monitor]: /opt/mapr/pid/cldb.pid exists with pid 12111 but no CLDB.
2017-03-02 10:09:15,575 INFO com.mapr.warden.service.baseservice.Service$ServiceRun [cldb_monitor]: Command: [/opt/mapr/initscripts/mapr-cldb, start], Directory: /opt/mapr/initscripts
2017-03-02 10:09:16,860 INFO com.mapr.warden.service.baseservice.Service$ServiceRun [cldb_monitor]: /opt/mapr/initscripts/mapr-cldb: line 149: ulimit: max user processes: cannot modify limit: Operation not permitted
Starting CLDB, logging to /opt/mapr/logs/cldb.log
2017-03-02 10:09:26,860 INFO com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [cldb_monitor]: [e_SERV_RUN, hostName, ma_host, ma_process]
2017-03-02 10:09:26,861 INFO com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [cldb_monitor]: []

 

/opt/mapr/logs/cldb.log:

2017-03-02 10:09:17,329 INFO ZooKeeper [main]: Client environment:java.library.path=/opt/mapr/lib
2017-03-02 10:09:17,329 INFO ZooKeeper [main]: Client environment:java.io.tmpdir=/tmp
2017-03-02 10:09:17,329 INFO ZooKeeper [main]: Client environment:java.compiler=<NA>
2017-03-02 10:09:17,329 INFO ZooKeeper [main]: Client environment:os.name=Linux
2017-03-02 10:09:17,329 INFO ZooKeeper [main]: Client environment:os.arch=amd64
2017-03-02 10:09:17,329 INFO ZooKeeper [main]: Client environment:os.version=3.19.0-66-generic
2017-03-02 10:09:17,329 INFO ZooKeeper [main]: Client environment:user.name=sgiorgi
2017-03-02 10:09:17,329 INFO ZooKeeper [main]: Client environment:user.home=/home/sgiorgi
2017-03-02 10:09:17,329 INFO ZooKeeper [main]: Client environment:user.dir=/opt/mapr/initscripts
2017-03-02 10:09:17,330 INFO ZooKeeper [main]: Initiating client connection, connectString=192.168.30.22:5181 sessionTimeout=30000 watcher=com.mapr.fs.cldb.CLDBServer@1b1d9c01
2017-03-02 10:09:17,352 INFO CLDBServer [main]: CLDB configured with ZooKeeper ensemble with connection string 192.168.30.22:5181
2017-03-02 10:09:17,392 INFO ActiveContainersMap [main]: Caching a max of 489980 containers in cache
2017-03-02 10:09:17,436 INFO Login [main-SendThread(hadoop-n2:5181)]: successfully logged in.
2017-03-02 10:09:17,439 INFO ZooKeeperSaslClient [main-SendThread(hadoop-n2:5181)]: Client will use SIMPLE-SECURITY as SASL mechanism.
2017-03-02 10:09:17,446 INFO ClientCnxn [main-SendThread(hadoop-n2:5181)]: Opening socket connection to server hadoop-n2/192.168.30.22:5181. Will attempt to SASL-authenticate using Login Context section 'Client_simple'
2017-03-02 10:09:17,448 INFO VolumeMirror [main]: Initializing volume mirror thread ...
2017-03-02 10:09:17,450 INFO VolumeMirror [main]: Spawned 1 VolumeMirror Threads
2017-03-02 10:09:17,455 INFO ClientCnxn [main-SendThread(hadoop-n2:5181)]: Socket connection established to hadoop-n2/192.168.30.22:5181, initiating session
2017-03-02 10:09:17,477 INFO ClientCnxn [main-SendThread(hadoop-n2:5181)]: Session establishment complete on server hadoop-n2/192.168.30.22:5181, sessionid = 0x15a8f81f20c0011, negotiated timeout = 30000
2017-03-02 10:09:17,480 INFO CLDBServer [main-EventThread]: The CLDB received notification that a ZooKeeper event of type None occurred on path null
2017-03-02 10:09:17,498 INFO CLDBServer [main-EventThread]: onZKConnect: The CLDB has successfully connected to the ZooKeeper server State:CONNECTED Timeout:30000 sessionid:0x15a8f81f20c0011 local:/192.168.30.22:41281 remoteserver:hadoop-n2/192.168.30.22:5181 lastZxid:0 xid:2 sent:1 recv:2 queuedpkts:0 pendingresp:0 queuedevents:1 in the ZooKeeper ensemble with connection string 192.168.30.22:5181
2017-03-02 10:09:17,543 INFO HttpServer [main]: Creating http listener for 0.0.0.0
2017-03-02 10:09:17,559 INFO log [main]: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2017-03-02 10:09:17,598 INFO CLDB [main]: CLDBState: CLDB State change : WAIT_FOR_FILESERVERS
2017-03-02 10:09:17,599 INFO CLDB [main]: CLDBInit: Starting RPCServer on port 7222 with num thread 10, heap size of 616(MB) and with startup options -Xms382m -Xmx637m -XX:ErrorFile=/opt/cores/hs_err_pid%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/cores -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly -XX:ThreadStackSize=256
2017-03-02 10:09:17,602 INFO CLDB [main]: CLDBInit: Starting HTTP Server
2017-03-02 10:09:17,602 INFO HttpServer [main]: WebServer: Starting WebServer
2017-03-02 10:09:17,603 INFO HttpServer [main]: Listener started on SelectChannelConnector@0.0.0.0:7221 port 7221
2017-03-02 10:09:17,603 INFO HttpServer [main]: Starting Jetty WebServer
2017-03-02 10:09:17,603 INFO log [main]: jetty-6.1.26
2017-03-02 10:09:17,632 INFO CLDBServer [main-EventThread]: The CLDB received notification that a ZooKeeper event of type None occurred on path null
2017-03-02 10:09:17,766 INFO ZooKeeperClient [ZK-Connect]: ZooKeeperClient: KvStore is of latest epoch CLDB trying to become Master
2017-03-02 10:09:17,799 INFO ZooKeeperClient [ZK-Connect]: ZooKeeperClient: CLDB is current Master
2017-03-02 10:09:17,799 INFO ZooKeeperClient [ZK-Connect]: CLDB became master. Initializing KvStoreContainer for cid: 1
2017-03-02 10:09:17,802 INFO ZooKeeperClient [ZK-Connect]: Storing KvStoreContainerInfo to ZooKeeper Container ID:1 Servers: 192.168.30.22-5(3838957175195388833) Inactive: 192.168.30.21-5(6719190855939160145) 192.168.30.17-5(3985165134879924127) Unused: Epoch:5 SizeMB:0
2017-03-02 10:09:17,816 INFO CLDBServer [ZK-Connect]: Starting thread to monitor waiting for local kvstore to become master
2017-03-02 10:09:18,146 INFO FileServerHandler [RPC-3]: FSRegister: Request FSID: 3838957175195388833 Build: 5.2.0.39122.GA PatchVersion: $Id: mapr-version: 5.2.0.39122.GA 39122:49f386ea0304 $ FSNetworkLocation: FSHost:Port: 192.168.30.22- FSHost: Secondary Ports 5692- FSHostName: hadoop-n2 StoragePools Capacity: 0 Available: 0 Used: 0 Role: 0 isDCA: false uniq: f58698cefceeeff4-58b831f30070ba Received registration request
2017-03-02 10:09:18,384 INFO Topology [RPC-3]: fsid:3838957175195388833 became reachable, removing from persist-store
2017-03-02 10:09:18,384 ERROR Topology [RPC-3]: Unable to remove entry from unreachableFSIdTable for fsId 3838957175195388833 current CLDB mode:MASTER_REGISTER_READY
2017-03-02 10:09:18,388 INFO FileServerHandler [RPC-3]: FSRegister: Registered FileServer: 192.168.30.22- at topology /default-rack/hadoop-n2/5660
2017-03-02 10:09:18,389 INFO FileServerHandler [RPC-3]: FileServer Registration Request: Node Configuration
2017-03-02 10:09:18,389 INFO FileServerHandler [RPC-3]: NumCpus: 16 Avail Memory: 2789 Num Sps: 0 Num Instances: 1
2017-03-02 10:09:18,596 INFO CLDBServer [RPC-9]: Rejecting RPC 2345.211 from 192.168.30.22:5660 with status 3 as CLDB is waiting for local kvstore to become master.
2017-03-02 10:09:18,663 INFO log [main]: Started SelectChannelConnector@0.0.0.0:7221
2017-03-02 10:10:18,638 INFO CLDBServer [RPC-2]: Rejecting RPC 2345.211 from 192.168.30.22:5660 with status 3 as CLDB is waiting for local kvstore to become master.
2017-03-02 10:10:45,130 INFO ACRProcessor [FCR-1]: FileServer 192.168.30.22 did not report volume 1 as part of FCR. Requesting node to confirm missing containers
2017-03-02 10:11:18,648 INFO CLDBServer [RPC-4]: Rejecting RPC 2345.211 from 192.168.30.22:5660 with status 3 as CLDB is waiting for local kvstore to become master.
2017-03-02 10:12:18,653 INFO CLDBServer [RPC-7]: Rejecting RPC 2345.211 from 192.168.30.22:5660 with status 3 as CLDB is waiting for local kvstore to become master.
2017-03-02 10:16:17,817 FATAL CLDB [WaitForLocalKvstore Thread]: CLDBShutdown: CLDB had master lock and was waiting for its local mfs to become Master.Waited for 7 (minutes) but mfs did not become Master. Shutting down CLDB to release master lock.
2017-03-02 10:16:17,820 ERROR CLDB [WaitForLocalKvstore Thread]: Thread: Signal Dispatcher ID: 5
2017-03-02 10:16:17,820 ERROR CLDB [WaitForLocalKvstore Thread]: Thread: HB-3 ID: 45
2017-03-02 10:16:17,820 ERROR CLDB [WaitForLocalKvstore Thread]: sun.misc.Unsafe.park(Native Method)
2017-03-02 10:16:17,820 ERROR CLDB [WaitForLocalKvstore Thread]: java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
2017-03-02 10:16:17,820 ERROR CLDB [WaitForLocalKvstore Thread]: java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
2017-03-02 10:16:17,820 ERROR CLDB [WaitForLocalKvstore Thread]: java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
2017-03-02 10:16:17,821 ERROR CLDB [WaitForLocalKvstore Thread]: java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
2017-03-02 10:16:17,821 ERROR CLDB [WaitForLocalKvstore Thread]: java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
2017-03-02 10:16:17,821 ERROR CLDB [WaitForLocalKvstore Thread]: java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
2017-03-02 10:16:17,821 ERROR CLDB [WaitForLocalKvstore Thread]: java.lang.Thread.run(Thread.java:745)
2017-03-02 10:16:17,821 ERROR CLDB [WaitForLocalKvstore Thread]: Thread: RPC-6 ID: 36
.....
2017-03-02 10:16:17,831 INFO CLDBServer [WaitForLocalKvstore Thread]: Shutdown: Stopping CLDB
2017-03-02 10:16:17,833 INFO CLDB [Thread-14]: CLDB ShutDown Hook called
2017-03-02 10:16:17,836 INFO ZooKeeperClient [Thread-14]: Setting the clean cldbshutdown flag to true
2017-03-02 10:16:17,855 INFO ZooKeeperClient [Thread-14]: Zookeeper Client: Closing client connection:
2017-03-02 10:16:17,863 INFO ZooKeeper [Thread-14]: Session: 0x15a8f81f20c0011 closed
2017-03-02 10:16:17,863 INFO CLDB [Thread-14]: CLDB shutdown
Loading /opt/mapr/server/permissions/libmapr_roles_refimpl.so
Resolving function 'getSecurityMembership()'
Resolving function 'cleanup()'

Outcomes