AnsweredAssumed Answered

3 Node AWS installation not working due to CLDB

Question asked by olu_jolly on Jul 25, 2016
Latest reply on Jul 27, 2016 by namato

I just installed a 3 node MapR cluster on AWS, right from the installation I realised that the ~CLDB  on the nodes apart from the master node were always falling off and I had to restart them through MCS.

 

I have even now relaised that the Zookeeper on the masternode is also not stable, I have to keep restarting it.

 

When I stooped the nodes and restarted the next day, the cluster wouldnt come up and battled this for 2 days now and still the same. This is my first time on MapR although I am familiar with other Hadoop platforms. I want to do my certification on this platform. Here are the errors I have gotten having tried to trouble shoot.

 

Even when at least one CLDB is running I get thsi error below

[root@ip-172-31-25-156 conf]# maprcli node list -filter "[rp==/*]and[svc==cldb]" -columns id,h,hn,svc,rp

ERROR (10009) -  Couldn't connect to the CLDB service. Check if at least one CLDB is running.

 

When I try to run an Hadoop process

[root@ip-172-31-25-158 server]# hadoop fs -ls /

2016-07-25 05:57:46,3763 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1313 Thread: 11943 Lookup of volume mapr.cluster.root failed, error Connection reset by peer(104), CLDB: 172.31.25.157:7222 backing off ...

2016-07-25 05:57:47,3771 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1313 Thread: 11943 Lookup of volume mapr.cluster.root failed, error Connection reset by peer(104), CLDB: 172.31.25.156:7222 backing off ...

2016-07-25 05:57:48,3783 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1611 Thread: 11943 MoveToNextCldb: No CLDB entries, cannot run, sleeping 5 seconds!

2016-07-25 05:58:01,3792 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1313 Thread: 11943 Lookup of volume mapr.cluster.root failed, error Connection reset by peer(104), CLDB: 172.31.25.157:7222 backing off ...

2016-07-25 05:58:01,3792 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1611 Thread: 11943 MoveToNextCldb: No CLDB entries, cannot run, sleeping 5 seconds!

 

When I check the config

[root@ip-172-31-25-156 ~]# /opt/mapr/server/mrconfig info containers rw

2016-07-25 04:43:41,3247 ERROR Global mrconfig.cc:4562 Failed to get master cldbIP and port from conf/mapr-clusters.conf file. Trying localhosted cldb

 

----------------------

|From Instance 5660::|

----------------------

2016-07-25 04:43:41,3328 ERROR Global mrconfig.cc:1698 CLDB returned an error while trying to obtain list of volumes Read-only file system (30)

 

When I check CLDB logs

 

2016-07-25 05:57:13,737 INFO log [main]: Started SelectChannelConnector@0.0.0.0:7221

2016-07-25 05:57:22,781 INFO CLDBServer [Lookup-1]: Rejecting RPC 2345.5 from 172.31.25.158:49873 with status 3 as CLDB is waiting for local kvstore to become master.

2016-07-25 05:58:23,950 INFO CLDBServer [Lookup-3]: Rejecting RPC 2345.5 from 172.31.25.158:60864 with status 3 as CLDB is waiting for local kvstore to become master.

2016-07-25 05:59:25,475 INFO CLDBServer [Lookup-6]: Rejecting RPC 2345.5 from 172.31.25.157:47204 with status 3 as CLDB is waiting for local kvstore to become master.

2016-07-25 06:00:26,374 INFO CLDBServer [Lookup-1]: Rejecting RPC 2345.5 from 172.31.25.157:36952 with status 3 as CLDB is waiting for local kvstore to become master.

2016-07-25 06:01:32,618 INFO CLDBServer [Lookup-5]: Rejecting RPC 2345.5 from 172.31.25.158:50940 with status 3 as CLDB is waiting for local kvstore to become master.

2016-07-25 06:02:32,620 INFO CLDBServer [Lookup-1]: Rejecting RPC 2345.5 from 172.31.25.158:50940 with status 3 as CLDB is waiting for local kvstore to become master.

2016-07-25 06:03:33,434 INFO CLDBServer [Lookup-6]: Rejecting RPC 2345.5 from 172.31.25.157:55954 with status 3 as CLDB is waiting for local kvstore to become master.

2016-07-25 06:04:13,222 FATAL CLDB [WaitForLocalKvstore Thread]: CLDBShutdown: CLDB had master lock and was waiting for its local mfs to become Master.Waited for 7 (minutes) but mfs did not become Master. Shutting down CLDB to release master lock.

 

 

Any ideas guys I would sincerely appreciate it.

Outcomes