AnsweredAssumed Answered

Recovering MapR cluster

Question asked by maprNewbie on Jul 13, 2016
Latest reply on Aug 17, 2016 by Ted Dunning

Hello!

 

I was working with a 10 machine cluster with a replication factor of 5. Machine 4 and 5 were the zookeeper, cldb, and webserver nodes. The rest were nodes.

 

Machines 4 and 5 have since died. I am trying now to bring back the servers. What I have done so far

 

1) Install zookeepers on machines 6 and 7. They are online and running.

2) Install cldb on Machine 1.

 

This is as far as I have gotten. When I restart machine 1, cldb service runs for a second, and then goes away. I notice this when I type in jps

 

988 WardenMain

23862 Jps

3591 FsShell

 

I look into cat /opt/mapr/logs/mfs and it has these lines at the end

 

2016-07-13 23:19:01,9012 INFO  fs/server/container/containerreport.h:71 x.x.0.0:0 ID : 228012258

2016-07-13 23:19:01,9012 INFO  fs/server/container/containerreport.h:71 x.x.0.0:0 ID : 29322979

2016-07-13 23:19:01,9012 INFO  fs/server/container/containerreport.h:71 x.x.0.0:0 ID : 124810100

2016-07-13 23:25:43,8437 ERROR  cldbha.cc:965 x.x.0.0:0 Failed to reach CLDB node due to error Read-only file system (30) for operation 2345.33 at 10.252.101.70:7222. Will retry after finding CLDB master.

2016-07-13 23:25:43,8443 ERROR  cldbha.cc:698 x.x.0.0:0 Got error Read-only file system (30) while trying to register with CLDB 10.252.101.70:7222

2016-07-13 23:25:46,8457 ERROR  cldbha.cc:698 x.x.0.0:0 Got error Connection reset by peer (104) while trying to register with CLDB 10.252.101.70:7222

2016-07-13 23:26:55,9686 INFO  fileserver.cc:9508 x.x.0.0:0 CLDB asked me to accept StoragePool 61b80d536dda768b005605e8c9005b40

2016-07-13 23:26:55,9686 INFO  fileserver.cc:9518 x.x.0.0:0 SP with id 61b80d536dda768b005605e8c9005b40, already accepted.

2016-07-13 23:26:55,9686 INFO  cldbha.cc:732 x.x.0.0:0 Re-established communication link with CLDB master at 10.252.101.70:7222.

2016-07-13 23:26:55,9901 ERROR  fileserver.cc:8090 x.x.0.0:0 heartbeat thread didn't get response for 72147 msec

2016-07-13 23:26:55,9901 INFO  fileserver.cc:9048 x.x.0.0:0 recieved updated no-compress list from cldb: bz2,gz,tgz,tbz2,zip,z,Z,mp3,jpg,jpeg,mpg,mpeg,avi,gif,png,lzo,j

 

 

 

Is there something I can do to make this work?

Outcomes