AnsweredAssumed Answered

can not connection the zookeeper when add a node to the cluster

Question asked by seamus on Jul 6, 2017
Latest reply on Jul 12, 2017 by mufeed

I have a cluster whit 3 nodes. d0,d1,d2. Now i want add a d3 node.

 

I excute :

   ...

   yum install mapr-fileserver mapr-nodemanager,

   /opt/mapr/server/configure.sh -C d2 -Z d0,d1,d2 -RM d0,d1  -HS d2 -N mapr.earlydata.com, 

   /opt/mapr/server/disksetup -F /opt/mapr/conf/disks.txt

   ...

 

At last ,I start mapr-warden, i get some error on /opt/mapr/logs/warden.log;

Header 1
Header: hostName: localhost, Time Zone: 中国标准时间, processName: warden, processId: 25979, MapR Build Version: 5.2.0.39122.GA
2017-07-07 01:02:22,962 INFO com.mapr.warden.WardenMain [main]: Log dir: /opt/mapr/hadoop/hadoop-2.7.0/logs
2017-07-07 01:02:22,963 INFO com.mapr.warden.WardenMain [main]: Log dir: /opt/mapr/hadoop/hadoop-0.20.2/logs
2017-07-07 01:02:22,990 INFO com.mapr.job.mngmnt.hadoop.metrics.MaprRPCContext [main]: init MAPRContext
2017-07-07 01:02:22,991 INFO com.mapr.job.mngmnt.hadoop.metrics.MaprRPCContext [main]: init MAPRContextHS
2017-07-07 01:02:23,000 INFO com.mapr.warden.WardenMain [main]: My pid: 26359
Warden started
Warden started
In sysVol
2017-07-07 01:02:23,273 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main]: Connected to ZK: d0:5181,d1:5181,d2:5181Wi
th State: State:CONNECTED Timeout:30000 sessionid:0x15d0d9fc003013e local:/192.168.0.56:41760 remoteserver:d1.infopower.com/192.168.0.21:5181 lastZxid:0 xid:1 sent:
1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
2017-07-07 01:02:23,274 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: ZK Connect state:State:CONNECT
ED Timeout:30000 sessionid:0x15d0d9fc003013e local:/192.168.0.56:41760 remoteserver:d1.infopower.com/192.168.0.21:5181 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 p
endingresp:0 queuedevents:0
2017-07-07 01:02:23,286 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event stat
e: SaslAuthenticated. Event type: None
2017-07-07 01:02:23,390 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event stat
e: Disconnected. Event type: None
2017-07-07 01:02:23,395 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main]: Lost connection to ZK while trying to checkZ
KNodeForExistence of: /nodes/localhost/stop. Retrying...
2017-07-07 01:02:23,727 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: ZK Connect state:State:CONNECT
ED Timeout:30000 sessionid:0x15d0d9fc003013e local:/192.168.0.56:51309 remoteserver:d0.infopower.com/192.168.0.20:5181 lastZxid:0 xid:3 sent:2 recv:2 queuedpkts:0 p
endingresp:0 queuedevents:0
2017-07-07 01:02:23,727 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event stat
e: SaslAuthenticated. Event type: None
2017-07-07 01:02:23,828 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event stat
e: Disconnected. Event type: None
2017-07-07 01:02:24,256 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: ZK Connect state:State:CONNECT
ED Timeout:30000 sessionid:0x15d0d9fc003013e local:/192.168.0.56:44205 remoteserver:d2.infopower.com/192.168.0.22:5181 lastZxid:0 xid:3 sent:3 recv:3 queuedpkts:1 p
endingresp:0 queuedevents:0
2017-07-07 01:02:24,256 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event stat
e: SaslAuthenticated. Event type: None
2017-07-07 01:02:24,357 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main]: Lost connection to ZK while trying to checkZ
KNodeForExistence of: /nodes/localhost/stop. Retrying...
2017-07-07 01:02:24,358 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event stat
e: Disconnected. Event type: None
2017-07-07 01:02:24,586 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: ZK Connect state:State:CONNECT
ED Timeout:30000 sessionid:0x15d0d9fc003013e local:/192.168.0.56:41763 remoteserver:d1.infopower.com/192.168.0.21:5181 lastZxid:0 xid:5 sent:4 recv:4 queuedpkts:0 p
endingresp:0 queuedevents:0
2017-07-07 01:02:24,586 INFO com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event stat
e: SaslAuthenticated. Event type: None

 

The new node can not connection to zookeeper. but my zookeepers on d0,d1,d2 are running. 

 

When i excute :maprcli node services -name nodemanager -action start -nodes d3.I get error :ERROR (10006) -  Unable to obtain the ZooKeeper connection string from the CLDB. Make sure that the CLDB is running and accessible.

 

I get /opt/mapr/logs/cldb.log on d2.

Header 1
2017-07-06 17:19:05,168 WARN Alarms [RPC-5]: Alarm raised: NODE_ALARM_SERVICE_IMPALACATALOG_DOWN:d2.infopower.com:NODE_ALARM; Cluster: mapr.earlydata.com; Can not determine if service: impalacatalog is running. Check logs at: /opt/mapr/impala/impala-2.5.0/logs/impalacatalog.log
2017-07-06 17:19:48,754 INFO ReplicationHandlerThread [Repl]: <PRIORITY_REPLICATION> P=1184; F=1184; QS=148;
2017-07-06 17:19:48,754 INFO ReplicationHandlerThread [Repl]: <UNDER_REPLICATION> P=1104; F=1104; QS=138;
_2017-07-06 17:19:33,9853lengths all wrong total=77, ticket=10270, hdr=82, msg=67830, received from 192.168.0.56:41772
2017-07-06 17:19:33,9859lengths all wrong total=77, ticket=10270, hdr=83, msg=67830, received from 192.168.0.56:60159
2017-07-06 17:19:33,9874lengths all wrong total=77, ticket=10270, hdr=84, msg=67830, received from 192.168.0.56:40346
2017-07-06 17:19:43,0215lengths all wrong total=37, ticket=30, hdr=132, msg=67584, received from 192.168.0.56:60837
2017-07-06 17:19:45,2326lengths all wrong total=77, ticket=10270, hdr=34, msg=67830, received from 192.168.0.56:46950
2017-07-06 17:19:45,2330lengths all wrong total=77, ticket=10270, hdr=33, msg=67830, received from 192.168.0.56:54724
2017-07-06 17:19:45,2334lengths all wrong total=77, ticket=10270, hdr=32, msg=67830, received from 192.168.0.56:58686
2017-07-06 17:19:54,2698hdrlen bad, 236, received from 192.168.0.56:58042
2017-07-06 17:19:56,2437hdrlen bad, 168, received from 192.168.0.56:46783
2017-07-06 17:19:56,2441hdrlen bad, 169, received from 192.168.0.56:45705
2017-07-06 17:19:56,2445hdrlen bad, 170, received from 192.168.0.56:45615
2017-07-06 17:20:05,2818hdrlen bad, 160, received from 192.168.0.56:56393
2017-07-06 17:20:07,6559hdrlen bad, 160, received from 192.168.0.56:50409
2017-07-06 17:20:07,6564hdrlen bad, 161, received from 192.168.0.56:43418
2017-07-06 17:20:07,6569hdrlen bad, 162, received from 192.168.0.56:45666
2017-07-06 17:20:16,6927hdrlen bad, 254, received from 192.168.0.56:35005
2017-07-06 17:20:19,0110hdrlen bad, 214, received from 192.168.0.56:59031
2017-07-06 17:20:19,0114hdrlen bad, 215, received from 192.168.0.56:41394
2017-07-06 17:20:19,0117hdrlen bad, 208, received from 192.168.0.56:41331
2017-07-06 17:20:28,0463lengths all wrong total=37, ticket=30, hdr=10, msg=67584, received from 192.168.0.56:40163
2017-07-06 17:20:30,1996hdrlen bad, 255, received from 192.168.0.56:40918
2017-07-06 17:20:30,2001hdrlen bad, 254, received from 192.168.0.56:49859
2017-07-06 17:20:30,2004hdrlen bad, 249, received from 192.168.0.56:39119
2017-07-06 17:20:39,2372hdrlen bad, 170, received from 192.168.0.56:44569
2017-07-06 17:20:41,0686hdrlen bad, 198, received from 192.168.0.56:51526
2017-07-06 17:20:41,0690hdrlen bad, 199, received from 192.168.0.56:36291
2017-07-06 17:20:41,0695hdrlen bad, 196, received from 192.168.0.56:57065

 

Help!

thanks!

Outcomes