AnsweredAssumed Answered

Not able to start Spark 1.5.2 slaves

Question asked by evckumar1 on Jan 8, 2016
Latest reply on Mar 18, 2016 by Hao Zhu
Hello Guys,

I am using spark 1.5.2 with one slave, one master and one history server. I am integrated Spark 1.5.2 with Hive 1.2

My configurations are as below:-

# cat /opt/mapr/spark/spark-1.5.2/conf/slaves
# A Spark Worker will be started on each of the machines listed below.
localhost

# cat /opt/mapr/spark/spark-1.5.2/conf/spark-defaults.conf
# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
spark.executor.memory              2g
spark.logConf                      true
spark.eventLog.dir                 maprfs:///apps/spark
spark.eventLog.enabled             true
spark.sql.hive.metastore.sharedPrefixes  com.mysql.jdbc,org.postgresql,com.microsoft.sqlserver,oracle.jdbc,com.mapr.fs.shim.LibraryLoader,com.mapr.security.JNISecurity,com.mapr.fs.jni
spark.executor.extraClassPath
spark.master                   spark://10.106.128.46:7077
spark.yarn.historyServer.address http://10.106.128.46:18080
spark.akka.heartbeat.interval 100
spark.sql.hive.metastore.version 1.2.1
spark.sql.hive.metastore.jars :/opt/mapr/hadoop/hadoop-0.20.2/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:..:/opt/mapr/hive/hive-1.2/lib/*

/opt/mapr/spark/spark-1.5.2/conf/spark-env.sh
export SPARK_MASTER_HOST=localhost
export SPARK_MASTER_IP=10.106.128.46


I am getting below error when I start slaves.

I am able to listen to port 7077 and telnet to this port is not working? There are no errors reported when started Master. When I do netstat for 7077 it's only listening to localhost, any ideas?

# lsof -i :7077
COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
java    23020 mapr  145u  IPv6 30834194      0t0  TCP localhost:7077 (LISTEN)

t# netstat -a | grep 7077
tcp        0      0 localhost:7077              *:*                         LISTEN

16/01/08 02:56:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/08 02:56:57 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@10.106.128.46:7077] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://sparkMaster@10.106.128.46:7077]] Caused by: [Connection refused: /10.106.128.46:7077]
16/01/08 02:56:57 WARN Worker: Failed to connect to master 10.106.128.46:7077
akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster@10.106.128.46:7077/), Path(/user/Master)]
        at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
        at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
        at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
        at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
        at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
        at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
        at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
        at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266)
        at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:533)
        at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:569)
        at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:559)
        at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
        at akka.remote.EndpointWriter.postStop(Endpoint.scala:557)
        at akka.actor.Actor$class.aroundPostStop(Actor.scala:477)
        at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:411)
        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
        at akka.actor.ActorCell.terminate(ActorCell.scala:369)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Outcomes