AnsweredAssumed Answered

CLDB fails to comeup after fresh installation,Problem in bringing up CLDB for a new intsllation

Question asked by ngguru on Oct 6, 2015
Latest reply on Oct 7, 2015 by Ted Dunning
TYPE of installation : On 2 VMs, 32 GB RAM each & 1 TB Storage each

tar -xvf mapr-v5.0.0GA.rpm.gz
 
 [root@corvmlinvhapd01 base]# rpm -ivh mapr-fileserver-5.0.0.32987.GA-1.x86_64.rpm mapr-zookeeper-5.0.0.32987.GA-1.x86_64.rpm mapr-nodemanager-2.7.0.32987.GA-1.x86_64.rpm mapr-mapreduce2-2.7.0.32987.GA-1.x86_64.rpm mapr-tasktracker-5.0.0.32987.GA-1.x86_64.rpm mapr-hadoop-core-2.7.0.32987.GA-1.x86_64.rpm  mapr-jobtracker-5.0.0.32987.GA-1.x86_64.rpm mapr-gateway-5.0.0.32987.GA-1.x86_64.rpm mapr-core-internal-5.0.0.32987.GA-1.x86_64.rpm mapr-core-5.0.0.32987.GA-1.x86_64.rpm mapr-cldb-5.0.0.32987.GA-1.x86_64.rpm mapr-webserver-5.0.0.32987.GA-1.x86_64.rpm mapr-historyserver-2.7.0.32987.GA-1.x86_64.rpm mapr-mapreduce1-0.20.2.32987.GA-1.x86_64.rpm mapr-nfs-5.0.0.32987.GA-1.x86_64.rpm mapr-zk-internal-5.0.0.32987.GA.v3.4.5-1.x86_64.rpm mapr-resourcemanager-2.7.0.32987.GA-1.x86_64.rpm
 
 [root@corvmlinvhapd01 roles]# ls -lrt
total 0
-rwxr-xr-x 1 root root 0 Jul  9 19:40 nodemanager
-rwxr-xr-x 1 root root 0 Jul  9 19:40 resourcemanager
-rwxr-xr-x 1 root root 0 Jul  9 19:40 historyserver
-rwxr-xr-x 1 root root 0 Jul  9 20:31 cldb
-rwxr-xr-x 1 root root 0 Jul  9 20:31 jobtracker
-rwxr-xr-x 1 root root 0 Jul  9 20:31 tasktracker
-rwxr-xr-x 1 root root 0 Jul  9 20:31 webserver
-rwxr-xr-x 1 root root 0 Jul  9 20:31 zookeeper
-rwxr-xr-x 1 root root 0 Jul  9 20:31 nfs
-rwxr-xr-x 1 root root 0 Jul  9 20:31 fileserver
-rwxr-xr-x 1 root root 0 Jul  9 20:32 gateway


cd /opt/mapr/conf

#!/bin/bash
# Copyright (c) 2009 & onwards. MapR Tech, Inc., All rights reserved
# Please set all environment variable you want to be used during MapR cluster
# runtime here.
# namely MAPR_HOME, JAVA_HOME, MAPR_SUBNETS

#set JAVA_HOME to override default search
#export JAVA_HOME=
export JAVA_HOME=/usr/java/jdk1.7.0_79
#export MAPR_SUBNETS=
export MAPR_SUBNETS=10.31.13.206,10.31.13.207
#export MAPR_HOME=
export MAPR_HOME=/opt/mapr
#export MAPR_ULIMIT_U=
#export MAPR_ULIMIT_N=
#export MAPR_SYSCTL_SOMAXCONN=



[root@corvmlinvhapd01 conf]# /opt/mapr/server/configure.sh -N CHED.cluster.com -C 10.31.13.206 -Z 10.31.13.206 -D /dev/mapper/app_vg-app_lv -no-autostart -v -u mapr -g mapr
create /opt/mapr/conf/conf.old
Configuring Hadoop-2.7.0 at /opt/mapr/hadoop/hadoop-2.7.0
Done configuring Hadoop
Using 7222 port for CLDB 10.31.13.206
Using 5181 port for ZooKeeper 10.31.13.206
Checking if Diskspace is on "/opt" is greater than 1024 MB
Diskspace on "/opt" is 12432 MB. Passed.
Checking if Diskspace is on "/tmp" is greater than 1024 MB
Diskspace on "/tmp" is 4534 MB. Passed.
Checking if system has at least 4096 MB of memory.
System has enough memory: 32102 MB
Generating disklist file at: /tmp/43978-disklist.txt with the following disks /dev/mapper/app_vg-app_lv
Using disklist file /tmp/43978-disklist.txt
Checking if "/dev/mapper/app_vg-app_lv" exists
All disks exist.
CLDB node list: 10.31.13.206:7222
Zookeeper node list: 10.31.13.206:5181

Node install STARTED
-----------------------
CMD: /opt/mapr/server/configure.sh -N CHED.cluster.com -C 10.31.13.206 -Z 10.31.13.206 -D /dev/mapper/app_vg-app_lv -no-autostart -v -u mapr -g mapr
Cluster run as secure=false
Contructing ClusterConfFile: cldb node list: 10.31.13.206:7222
Adding "CHED.cluster.com secure=false 10.31.13.206:7222" to "/opt/mapr/conf/mapr-clusters.conf"
Contructing ClusterConfFile: Done
MAPR_USER: mapr MAPR_GROUP: mapr
CREATE_USER:
maprUserId: maprGroupId:
Give privilleges to mapr
Config MAPR_USER for logs/conf of MapR Services
Update /opt/mapr/conf/daemon.conf
set mapr limits in /etc/security/limits.conf
mapr/mapr user/group configured
No RM addresses were provided. Will configure MapR HA for Resource Manager..
No IP/hostname provided for History Server. Will be configured to 0.0.0.0
Node setup configuration:  cldb fileserver gateway historyserver jobtracker nfs nodemanager resourcemanager tasktracker webserver zookeeper
Log can be found at:  /opt/mapr/logs/configure.log
Updating file client config
Config MAPR_USER for ZooKeeper Role
Adding: "mfs.cache.lru.sizes=inode:10:meta:10:dir:30:small:10:db:15:valc:3" to "/opt/mapr/conf/mfs.conf"
Adding: "#mfs.cache.lru.sizes=inode:3:meta:6:small:27:dir:6:db:20:valc:3" to "/opt/mapr/conf/mfs.conf"
Adding: "mfs.on.virtual.machine=0" to "/opt/mapr/conf/mfs.conf"
Configuring Webserver
Generating ssl keys
Creating 10 year self signed certificate with subjectDN='CN=corvmlinvhapd01'
SSL keys succefully generated
Configuring Hadoop
Updating JT config
Updating file client config
Configuring TaskTracker role
Config MAPR_USER for TT Role
Updating file client config
Skipping Drill Bits Role configuration... Not found
Skipping Hbase Master Role configuration... Not found
Skipping Hbase RS Role configuration... Not found
Skipping Hbase Client Role configuration... Not found
Skipping Job Management Role configuration... Metrics not found
Skipping Oozie Role configuration... Not found
Updating Warden config
Adding "isDB=true" to "/opt/mapr/conf/warden.conf"
Running disksetup: "/opt/mapr/server/disksetup -F /tmp/43978-disklist.txt"
/dev/mapper/app_vg-app_lv added.
Removing temporary disklist file: /tmp/43978-disklist.txt
Node not starting automatically.
Run "service mapr-zookeeper start" in order to start the zookeeper node and then run "service mapr-warden start" in order to start this node

Node install FINISHED
-----------------------



rpm -ivh mapr-hadoop-core-2.7.0.32987.GA-1.x86_64.rpm mapr-core-5.0.0.32987.GA-1.x86_64.rpm mapr-mapreduce2-2.7.0.32987.GA-1.x86_64.rpm mapr-mapreduce1-0.20.2.32987.GA-1.x86_64.rpm mapr-fileserver-5.0.0.32987.GA-1.x86_64.rpm mapr-core-internal-5.0.0.32987.GA-1.x86_64.rpm mapr-nodemanager-2.7.0.32987.GA-1.x86_64.rpm mapr-tasktracker-5.0.0.32987.GA-1.x86_64.rpm
 
 [root@corvmlinvhapd02 roles]# ls -lrt
total 0
-rwxr-xr-x 1 root root 0 Jul  9 19:40 nodemanager
-rwxr-xr-x 1 root root 0 Jul  9 20:31 tasktracker
-rwxr-xr-x 1 root root 0 Jul  9 20:31 fileserver
[root@corvmlinvhapd02 roles]#


[root@corvmlinvhapd02 roles]# /opt/mapr/server/configure.sh -N CHED.cluster.com -C 10.31.13.206 -Z 10.31.13.206 -D /dev/mapper/app_vg-app_lv -no-autostart -v -u mapr -g mapr
create /opt/mapr/conf/conf.old
Configuring Hadoop-2.7.0 at /opt/mapr/hadoop/hadoop-2.7.0
Done configuring Hadoop
Using 7222 port for CLDB 10.31.13.206
Using 5181 port for ZooKeeper 10.31.13.206
Checking if Diskspace is on "/opt" is greater than 1024 MB
Diskspace on "/opt" is 12559 MB. Passed.
Checking if Diskspace is on "/tmp" is greater than 1024 MB
Diskspace on "/tmp" is 4395 MB. Passed.
Checking if system has at least 4096 MB of memory.
System has enough memory: 32102 MB
Generating disklist file at: /tmp/22055-disklist.txt with the following disks /dev/mapper/app_vg-app_lv
Using disklist file /tmp/22055-disklist.txt
Checking if "/dev/mapper/app_vg-app_lv" exists
All disks exist.
CLDB node list: 10.31.13.206:7222
Zookeeper node list: 10.31.13.206:5181

Node install STARTED
-----------------------
CMD: /opt/mapr/server/configure.sh -N CHED.cluster.com -C 10.31.13.206 -Z 10.31.13.206 -D /dev/mapper/app_vg-app_lv -no-autostart -v -u mapr -g mapr
Cluster run as secure=false
Contructing ClusterConfFile: cldb node list: 10.31.13.206:7222
Adding "CHED.cluster.com secure=false 10.31.13.206:7222" to "/opt/mapr/conf/mapr-clusters.conf"
Contructing ClusterConfFile: Done
MAPR_USER: mapr MAPR_GROUP: mapr
CREATE_USER:
maprUserId: maprGroupId:
Give privilleges to mapr
Config MAPR_USER for logs/conf of MapR Services
Update /opt/mapr/conf/daemon.conf
set mapr limits in /etc/security/limits.conf
mapr/mapr user/group configured
No RM addresses were provided. Will configure MapR HA for Resource Manager..
No IP/hostname provided for History Server. Will be configured to 0.0.0.0
Node setup configuration:  fileserver nodemanager tasktracker
Log can be found at:  /opt/mapr/logs/configure.log
Updating file client config
Skipping ZooKeeper Role configuration... Not found
Skipping CLDB Role configuration... Not found
Skipping NFS Role configuration... Not found
Skipping Webserver Role configuration... Not found
Skipping Job Tracker Role configuration... Not found
Configuring TaskTracker role
Config MAPR_USER for TT Role
Updating file client config
Skipping Drill Bits Role configuration... Not found
Skipping Hbase Master Role configuration... Not found
Skipping Hbase RS Role configuration... Not found
Skipping Hbase Client Role configuration... Not found
Skipping Job Management Role configuration... Metrics not found
Skipping Oozie Role configuration... Not found
Updating Warden config
Adding "isDB=true" to "/opt/mapr/conf/warden.conf"
Running disksetup: "/opt/mapr/server/disksetup -F /tmp/22055-disklist.txt"
/dev/mapper/app_vg-app_lv added.
Removing temporary disklist file: /tmp/22055-disklist.txt
Node not starting automatically.
Run "service mapr-warden start" in order to start this node

Node install FINISHED
-----------------------
mfs.log-3
----------------

2015-10-06 17:45:58,1418 ERROR  cldbha.cc:858 Got error Connection reset by peer (104) while trying to register with CLDB 10.31.13.206:7222
2015-10-06 17:46:01,1439 INFO  cldbha.cc:465 No storage pools are ready on this node. Marking CLDB unreachable
2015-10-06 17:46:01,1443 ERROR  fileserver.cc:9609 Heartbeat to cldb failed No such device (19). cldb:10.31.13.206:7222
2015-10-06 17:52:17,1582 INFO  cldbha.cc:893 Re-established communication link with CLDB master at 10.31.13.206:7222.
2015-10-06 17:52:17,1582 INFO  fileserver.cc:10046 Registered with cldb 10.31.13.206:7222
2015-10-06 17:52:17,4830 INFO  fileserver.cc:10678 recieved updated no-compress list from cldb: bz2,gz,tgz,tbz2,zip,z,Z,mp3,jpg,jpeg,mpg,mpeg,avi,gif,png,lzo,j
2015-10-06 17:52:24,3114 INFO  fileserver.cc:10063 Sending full container report to cldb.
2015-10-06 17:52:24,3116 INFO  fileserver.cc:10178 Sending vol list with 0 volumes.
2015-10-06 17:55:26,5119 INFO  fileserver.cc:10063 Sending full container report to cldb.
2015-10-06 17:55:26,5122 INFO  fileserver.cc:10178 Sending vol list with 0 volumes.
2015-10-06 17:58:28,6201 INFO  fileserver.cc:10063 Sending full container report to cldb.
2015-10-06 17:58:28,6204 INFO  fileserver.cc:10178 Sending vol list with 0 volumes.
2015-10-06 17:59:15,5063 ERROR  cldbha.cc:1128 Failed to reach CLDB node due to error Connection reset by peer (104) for operation 2345.33 at 10.31.13.206:7222. Will retry after finding CLDB master.
2015-10-06 17:59:15,5067 ERROR  cldbha.cc:858 Got error Connection reset by peer (104) while trying to register with CLDB 10.31.13.206:7222
2015-10-06 17:59:18,5086 INFO  cldbha.cc:465 No storage pools are ready on this node. Marking CLDB unreachable
2015-10-06 17:59:18,5091 ERROR  fileserver.cc:9609 Heartbeat to cldb failed No such device (19). cldb:10.31.13.206:7222


CLDB LOG
------------------------
Header: hostName: corvmlinvhapd01, Time Zone: Australian Eastern Standard Time (Victoria), processName: cldb, processId: 50977, MapR Build Version: 5.0.0.32987.GA
2015-10-01 17:54:55,840 INFO CLDB [main]: Loading properties file : /opt/mapr/conf/cldb.conf
2015-10-01 17:54:58,955 INFO CLDBMetrics [main]: Initializing CLDB Metrics with serviceName: cldbServer
2015-10-01 17:54:59,039 INFO CLDB [main]: CLDBInit: Using hostname file /opt/mapr/hostname and hostid file /opt/mapr/hostid
2015-10-01 17:54:59,040 INFO CLDB [main]: CLDB Properties from configuration file: cldb.web.port=7221cldb.zookeeper.servers=10.31.13.206:5181cldb.numthreads=10cldb.web.https.port=7443hadoop.version=2.7.0cldb.port=7222cldb.min.fileservers=1cldb.detect.dup.hostid.enabled=falsenum.volmirror.threads=1cldb.jmxremote.port=7220
2015-10-01 17:54:59,065 INFO CLDB [main]: CLDB Command line args: /opt/mapr/conf/cldb.conf
2015-10-01 17:54:59,065 INFO CLDB [main]: CLDBInit: Initializing CLDB
2015-10-01 17:54:59,090 INFO CLDB [main]: MapR BuildVersion: 5.0.0.32987.GA
2015-10-01 17:54:59,090 INFO CLDB [main]: CLDBInit: Start CLDBServer
2015-10-01 17:54:59,681 INFO CLDBServer [main]: CLDBInit: HostName: corvmlinvhapd01 ServerId: 6222901721359368383
2015-10-01 17:54:59,681 INFO CLDBServer [main]: CLDBInit: Cluster name : CHED.cluster.com
2015-10-01 17:54:59,783 INFO CLDBServer [main]: CLDB creds setting uid as 5000
2015-10-01 17:54:59,783 INFO CLDBServer [main]: CLDB creds setting adding gid 5000
2015-10-01 17:54:59,783 INFO CLDBServer [main]: CLDB creds setting adding gid 9002
2015-10-01 17:54:59,957 INFO CLDB [main]: CLDBState: CLDB State change : INITIAZING
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 7220; nested exception is:
        java.net.BindException: Address already in use

 
 
[root@corvmlinvhapd01 logs]# maprcli node list -json
{
        "timestamp":1444114756045,
        "timeofday":"2015-10-06 05:59:16.045 GMT+1100",
        "status":"ERROR",
        "errors":[
                {
                        "id":10009,
                        "desc":"Couldn't connect to the CLDB service. Check if at least one CLDB is running."
                }
        ]
}

Outcomes