AnsweredAssumed Answered

New Cluster -- Issues Starting

Question asked by mmercer on Oct 1, 2013
Latest reply on Oct 1, 2013 by nabeel
We have a poorly setup m3 installation on our normal clusters, so we wanted to test what performance would be like if we set one up from the ground and followed the normal rules.

We are running 11 nodes in the test cluster, single nic (for now), and the following:
Java 1.6.0u45 JDK, /opt/jdk1.6.0_45

/etc/profile.d/java.sh
export JAVA_HOME=/opt/jdk1.6.0_45
export PATH=$PATH:$JAVA_HOME/bin

We are running ubuntu 12.04 with the mapr repositories, and have configured the nodes with the following planning:

Mapr-test1
ZK
WS
FS
TT

Mapr-test2
ZK
FS
JT
TT

Mapr-test3-10
FS
TT

Mapr-test11
CLDB
ZK
NFS
TaskTracker

We went through configure.sh with the correct values (mapr-test11 for cldb, test11,1,2 for ZK), we have added the 3 disks from each node to /tmp/disks and run disksetup -F /tmp/disks

Once we begin to start services, we run into issues.  Initially, attempting to start zookeeper complained about not finding java (which was set via /etc/profile.d/java.sh -- go figure ).  Finally, worked around this via using alternatives to set all java binary locations for easy finding via /usr/bin

Upon getting zookeeper started, we attempt to start mapr-warden on mapr-test11, and we still get greeted with:
+======================================================================+
|      Error: JAVA_HOME is not set and Java could not be found         |
+----------------------------------------------------------------------+
| Please download the latest Sun JDK from the Sun Java web site        |
|       > http://java.sun.com/javase/downloads/ <                      |
|                                                                      |
| Hadoop requires Java 1.6 or later.                                   |
| NOTE: This script will find Sun Java whether you install using the   |
|       binary or the RPM based installer.                             |
+======================================================================+
mkdir: missing operand
Try `mkdir --help' for more information.
Starting WARDEN, logging to /opt/mapr/logs/warden.log
For diagnostics look at /opt/mapr/logs/ for createsystemvolumes.log, warden.log and configured services log files

If we check, warden *has* started, regardless of the above output, so we start to watch the cldb log and get this:
Header: hostName: mapr-test11.quantifind.com, Time Zone: Pacific Standard Time, processName: cldb, processId: 22658, MapR Build Version: 2.1.3.19871.GA
2013-10-01 15:39:46,219 INFO CLDB [main]: Loading properties file : /opt/mapr/conf/cldb.conf
2013-10-01 15:39:46,317 INFO CLDBMetrics [main]: Initializing CLDB Metrics with serviceName: cldbServer
2013-10-01 15:39:46,322 INFO CLDB [main]: CLDBInit: Using hostname file /opt/mapr/hostname and hostid file /opt/mapr/hostid
2013-10-01 15:39:46,322 INFO CLDB [main]: CLDB Properties from configuration file: {cldb.web.port=7221, cldb.zookeeper.servers=mapr-test11.quantifind.com:5181,mapr-test1.quantifind.com:5181,mapr-test2.q
uantifind.com:5181, cldb.numthreads=10, hadoop.version=0.20.2, cldb.port=7222, cldb.min.fileservers=1, cldb.detect.dup.hostid.enabled=false, num.volmirror.threads=1, cldb.jmxremote.port=7220}
2013-10-01 15:39:46,322 INFO CLDB [main]: CLDB Command line args: /opt/mapr/conf/cldb.conf
2013-10-01 15:39:46,322 INFO CLDB [main]: CLDBInit: Initializing CLDB
2013-10-01 15:39:46,322 INFO CLDB [main]: CLDBInit: Starting RPCServer on port 7222 with num thread 10 and heap size of 1911(MB)
2013-10-01 15:39:46,340 INFO CLDB [main]: MapR BuildVersion: 2.1.3.19871.GA
2013-10-01 15:39:46,340 INFO CLDB [main]: CLDBInit: Start CLDBServer
2013-10-01 15:39:46,367 INFO CLDBServer [main]: CLDBInit: HostName: mapr-test11.quantifind.com ServerId: 2865599929812237600
2013-10-01 15:39:46,368 INFO CLDBServer [main]: CLDBInit: Cluster name : Mapr-Test
2013-10-01 15:39:46,373 INFO CLDBServer [main]: CLDB creds setting uid as 1503
2013-10-01 15:39:46,373 INFO CLDBServer [main]: CLDB creds setting adding gid 42
2013-10-01 15:39:46,373 INFO CLDBServer [main]: CLDB creds setting adding gid 1503
2013-10-01 15:39:46,386 INFO CLDB [main]: CLDBState: CLDB State change : INITIAZING
2013-10-01 15:39:46,399 INFO ZooKeeperClient [main]: ZooKeeperClient init: zk timeout = 30000 ms
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:zookeeper.version=3.3.6--1, built on 09/07/2012 18:16 GMT
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:host.name=mapr-test11.quantifind.com
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:java.version=1.6.0_45
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:java.vendor=Sun Microsystems Inc.
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:java.home=/opt/jdk1.6.0_45/jre
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:java.class.path=/opt/mapr:/opt/mapr/conf:/opt/mapr/lib/adminuiapp-0.1.jar:/opt/mapr/lib/ant-1.7.1.jar:/opt/mapr/lib/antlr-2.7.7.jar:/opt
/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/c3p0-0.9.1.2.jar:/opt/mapr/lib/cldb-0.1.jar:/opt/mapr/lib/cliframework-0.1.jar:/opt/mapr/lib/commons-codec-1.5.jar:/opt/mapr/lib/commons-collections-3.2.1.jar:/
opt/mapr/lib/commons-el-1.0.jar:/opt/mapr/lib/commons-email-1.2.jar:/opt/mapr/lib/commons-lang-2.5.jar:/opt/mapr/lib/commons-logging-1.0.4.jar:/opt/mapr/lib/commons-logging-api-1.0.4.jar:/opt/mapr/lib/d
om4j-1.6.1.jar:/opt/mapr/lib/eval-0.5.jar:/opt/mapr/lib/flexjson-2.1.jar:/opt/mapr/lib/globalfsck-0.1.jar:/opt/mapr/lib/google-collect-1.0.jar:/opt/mapr/lib/gson-2.1.jar:/opt/mapr/lib/hadoop-metrics-0.2
0.2-dev.jar:/opt/mapr/lib/hadoop-metrics2-0.20.2-dev.jar:/opt/mapr/lib/hibernate-c3p0-3.3.1.GA.jar:/opt/mapr/lib/hibernate-commons-annotations-3.2.0.Final.jar:/opt/mapr/lib/hibernate-core-3.6.8.Final.ja
r:/opt/mapr/lib/httpclient-4.2.jar:/opt/mapr/lib/httpclient-cache-4.2.jar:/opt/mapr/lib/httpcore-4.2.jar:/opt/mapr/lib/jasper-compiler-5.5.12.jar:/opt/mapr/lib/jasper-runtime-5.5.12.jar:/opt/mapr/lib/ja
vassist-3.12.1.GA.jar:/opt/mapr/lib/jetty-6.1.26.jar:/opt/mapr/lib/jetty-plus-6.1.26.jar:/opt/mapr/lib/jetty-util-6.1.26.jar:/opt/mapr/lib/jobmngmnt-0.1.jar:/opt/mapr/lib/joda-time-2.0.jar:/opt/mapr/lib
/JPam-1.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/jsp-2.1.jar:/opt/mapr/lib/jsp-api-2.1.jar:/opt/mapr/lib/jta-1.1.jar:/opt/mapr/lib/junit-3.8.1.jar:/opt/mapr/lib/junit-4.5.jar:/opt/mapr/lib/kv
store-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/log4j-1.2.14.jar:/opt/mapr/lib/log4j-1.2.15.jar:/opt/mapr/lib/logging-0.1.jar:/opt/mapr/lib/mail.jar:/opt/mapr/lib/maprbuildversion.jar:/opt/ma
pr/lib/maprcli-0.1.jar:/opt/mapr/lib/maprfs-diagnostic-tools-0.20.2-2.1.3.jar:/opt/mapr/lib/maprfs-jni-0.20.2-2.1.3.jar:/opt/mapr/lib/maprfs-jni-0.20.2-2.1.3-tests.jar:/opt/mapr/lib/maprsecurity-0.1.jar
:/opt/mapr/lib/maprutil-0.1.jar:/opt/mapr/lib/persistence-api-1.0.jar:/opt/mapr/lib/protobuf-java-2.4.1-lite.jar:/opt/mapr/lib/servlet-api-2.5-6.1.26.jar:/opt/mapr/lib/volumemirror-0.1.jar:/opt/mapr/lib
/warden-0.1.jar:/opt/mapr/lib/zookeeper-3.3.6.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/maprfs-0.1.jar
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:java.library.path=/opt/mapr/lib
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:java.io.tmpdir=/tmp
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:java.compiler=<NA>
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:os.name=Linux
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:os.arch=amd64
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:os.version=3.5.0-41-generic
2013-10-01 15:39:46,407 INFO ZooKeeper [main]: Client environment:user.name=mapr
2013-10-01 15:39:46,408 INFO ZooKeeper [main]: Client environment:user.home=/home/mapr
2013-10-01 15:39:46,408 INFO ZooKeeper [main]: Client environment:user.dir=/etc/init.d
2013-10-01 15:39:46,408 INFO ZooKeeper [main]: Initiating client connection, connectString=mapr-test11.quantifind.com:5181,mapr-test1.quantifind.com:5181,mapr-test2.quantifind.com:5181 sessionTimeout=30
000 watcher=com.mapr.fs.cldb.CLDBServer@7aae3364
2013-10-01 15:39:46,437 INFO CLDBServer [main]: CLDB configured with ZooKeeper ensemble with connection string mapr-test11.quantifind.com:5181,mapr-test1.quantifind.com:5181,mapr-test2.quantifind.com:51
81
2013-10-01 15:39:46,437 INFO ClientCnxn [main-SendThread()]: Opening socket connection to server mapr-test2.quantifind.com/10.10.3.2:5181
2013-10-01 15:39:46,445 INFO ClientCnxn [main-SendThread(mapr-test2.quantifind.com:5181)]: Socket connection established to mapr-test2.quantifind.com/10.10.3.2:5181, initiating session
2013-10-01 15:39:46,489 INFO ClientCnxn [main-SendThread(mapr-test2.quantifind.com:5181)]: Session establishment complete on server mapr-test2.quantifind.com/10.10.3.2:5181, sessionid = 0x241761cf97e000
b, negotiated timeout = 30000
2013-10-01 15:39:46,493 INFO CLDBServer [main-EventThread]: The CLDB received notification that a ZooKeeper event of type None occurred on path null
2013-10-01 15:39:46,513 INFO CLDBServer [main-EventThread]: onZKConnect: The CLDB has successfully connected to the ZooKeeper server State:CONNECTED Timeout:30000 sessionid:0x241761cf97e000b local:/10.1
0.3.11:44216 remoteserver:mapr-test2.quantifind.com/10.10.3.2:5181 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0 in the ZooKeeper ensemble with connection string mapr-test11.q
uantifind.com:5181,mapr-test1.quantifind.com:5181,mapr-test2.quantifind.com:5181
prog: 2345, proc 31, RpcProgram not found
prog: 2345, proc 103, RpcProgram not found
prog: 2345, proc 40, RpcProgram not found
2013-10-01 15:39:46,762 INFO VolumeMirror [main]: Initializing volume mirror thread ...
2013-10-01 15:39:46,764 INFO VolumeMirror [main]: Spawned 1 VolumeMirror Threads
2013-10-01 15:39:46,808 INFO HttpServer [main]: Creating listener for 0.0.0.0
2013-10-01 15:39:46.826:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
2013-10-01 15:39:46,878 INFO CLDB [main]: CLDBState: CLDB State change : WAIT_FOR_FILESERVERS
2013-10-01 15:39:46,878 INFO CLDB [main]: CLDBInit: Exporting program 2346
2013-10-01 15:39:46,878 INFO CLDB [main]: CLDBInit: Exporting program 2345
2013-10-01 15:39:46,878 INFO CLDB [main]: CLDBInit: Starting HTTP Server
2013-10-01 15:39:46,878 INFO HttpServer [main]: WebServer: Starting WebServer
2013-10-01 15:39:46,880 INFO HttpServer [main]: Listener started on SelectChannelConnector@0.0.0.0:7221 port 7221
2013-10-01 15:39:46,880 INFO HttpServer [main]: Starting Jetty WebServer
2013-10-01 15:39:46.880:INFO::jetty-6.1.26
2013-10-01 15:39:46,968 INFO ZooKeeperClient [ZK-Connect]: ZooKeeperClient : No FileServers for KvStore container.  New Installation, becoming Master
2013-10-01 15:39:46,994 INFO ZooKeeperClient [ZK-Connect]: ZooKeeperClient: CLDB is current Master
2013-10-01 15:39:46,994 INFO ZooKeeperClient [ZK-Connect]: CLDB became master. Creating new KvStoreContainer with no fileservers for cid: 1
2013-10-01 15:39:46,996 INFO ZooKeeperClient [ZK-Connect]: Storing KvStoreContainerInfo to ZooKeeper  Container ID:1 Servers:  Inactive:  Unused:  Epoch:3 SizeMB:0
2013-10-01 15:39:47,019 INFO CLDBServer [RPC-1]: Rejecting RPC 2345.103 from 10.10.3.11:51877 with status 30 as CLDB is not yet initialized.
2013-10-01 15:39:47,023 INFO ZooKeeperClient [ZK-Connect]: CLDB became master. Initializing KvStoreContainer for cid: 1
2013-10-01 15:39:47,026 INFO ZooKeeperClient [ZK-Connect]: Storing KvStoreContainerInfo to ZooKeeper  Container ID:1 Servers:  Inactive:  Unused:  Epoch:3 SizeMB:0
2013-10-01 15:39:47,056 INFO CLDBServer [ZK-Connect]: Starting thread to monitor waiting for local kvstore to become master
2013-10-01 15:39:47.211:INFO::Started SelectChannelConnector@0.0.0.0:7221
2013-10-01 15:39:47,869 INFO CLDBServer [Lookup-1]: Rejecting RPC 2345.5 from 10.10.3.1:1111 with status 3 as CLDB is waiting for local kvstore to become master.
2013-10-01 15:39:47,885 INFO CLDBServer [RPC-3]: FSRegister: Request  FSID: 2865599929812237600 FSNetworkLocation:  FSHost:Port: 10.10.3.11- FSHostName: mapr-test11.quantifind.com StoragePools  Capacity
: 0 Available: 0 Used: 0 Role: 0 isDCA: false Received registration request
2013-10-01 15:39:47,885 INFO CLDBServer [RPC-3]: Cluster uuid is -4734711147176495883--1801228248212421432
2013-10-01 15:39:47,886 WARN Topology [RPC-3]: FileSever on mapr-test11.quantifind.com reported an invalid topology . Ignoring reported topology
2013-10-01 15:39:47,900 INFO CLDBServer [RPC-3]: FSRegister: Registered FileServer: 10.10.3.11- at topology /default-rack/mapr-test11.quantifind.com
2013-10-01 15:40:49,865 INFO CLDBServer [RPC-7]: Rejecting RPC 2345.40 from 10.10.3.2:32922 with status 3 as CLDB is waiting for local kvstore to become master.
2013-10-01 15:41:50,723 INFO CLDBServer [Lookup-7]: Rejecting RPC 2345.5 from 10.10.3.2:1111 with status 3 as CLDB is waiting for local kvstore to become master.
2013-10-01 15:42:51,952 INFO CLDBServer [RPC-4]: Rejecting RPC 2345.103 from 10.10.3.1:35159 with status 3 as CLDB is waiting for local kvstore to become master.
2013-10-01 15:43:55,107 INFO CLDBServer [RPC-5]: Rejecting RPC 2345.103 from 10.10.3.1:57381 with status 3 as CLDB is waiting for local kvstore to become master.


Figuring that the error was a result of the cldb host not having any FS purpose, we started the remainder of the nodes, but we are still unable to create a mapr user via the maprcli, the webserver has still failed to start.

Please provide some suggestion on what is going wrong / where it is going wrong.

Thanks,

Outcomes