AnsweredAssumed Answered

Spark 1.1.0 configuration for running in YARN

Question asked by yatrus_analytics on Mar 18, 2015
Latest reply on Jun 25, 2015 by Hao Zhu
Hi all,

I am running a three nodes cluster with installed M3 edition with YARN cluster manager on Ubuntu 14.04. I installed Spark 1.1.0 according to the documentation for running under YARN. Unfortunately when trying  to run spark-shell with master yarn-client I receive the following message

    org.apache.spark.SparkException: Yarn application already ended,might be killed or not able to launch application master.
     at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:114)
     at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:90)
     at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
     at org.apache.spark.SparkContext.<init>(SparkContext.scala:323)
     at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:972)
     at $iwC$$iwC.<init>(<console>:8)
     at $iwC.<init>(<console>:14)
     at <init>(<console>:16)
     at .<init>(<console>:20)
     at .<clinit>(<console>)
     at .<init>(<console>:7)
     at .<clinit>(<console>)
     at $print(<console>)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:606)
     at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
     at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
     at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615)
     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646)
     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610)
     at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:814)
     at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:859)
     at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771)
     at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:121)
     at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:120)
     at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:264)
     at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:120)
     at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:56)
     at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:931)
     at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:142)
     at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:56)
     at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:104)
     at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:56)
     at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:948)
     at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902)
     at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902)
     at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
     at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:902)
     at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:997)
     at org.apache.spark.repl.Main$.main(Main.scala:31)
     at org.apache.spark.repl.Main.main(Main.scala)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:606)
     at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    

The same message appears when trying to run script spark-submit as yarn-client. When trying to run these scripts as a default master (standalone mode) it works without errors and I was able to run through SparkPi example.

The configuration file yarn-site.xml is the following:

        <property>
        <name>yarn.application.classpath</name>
        <value>/opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop:
    /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop/*:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/*:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/*:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/lib/*:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/hdfs/*:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/*:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/*:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/lib/*:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/mapreduce/*:
    /opt/mapr/hadoop/hadoop-0.20.2/contrib/capacity-scheduler/*.jar:
    </value>
        </property>
        
        <property>
        <description>The address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
        </property>
        </configuration>
    
I have missed the part yarn.resourcemanager.hostname but it is set automatically. This configuration is set on the Slave nodes too. yarn.resourcemanager.webapp.address is set on the Master node only.

The file spark-default.conf looks like this

    spark.yarn.jar                     hdfs:///apps/spark/spark-assembly-1.1.0-hadoop2.4.1-mapr-1408.jar #I am using Hbase instead of Maprfs
    spark.executor.extraClassPath      /opt/mapr/lib/json-20080701.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop/:
    /opt/mapr/lib/hadoop-0.20.2-dev-core.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:
    /opt/mapr/lib/libprotodefs-4.0.1-mapr.jar:
    /opt/mapr/lib/maprutil-4.0.1-mapr.jar:
    /opt/mapr/lib/baseutils-4.0.1-mapr.jar:
    /opt/mapr/lib/maprfs-4.0.1-mapr.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/lib/zookeeper-3.4.5-mapr-1406.jar:
    /opt/mapr/lib/mapr-hbase-4.0.1-mapr.jar:
    /opt/mapr/lib/protobuf-java-2.5.0.jar:
    /opt/mapr/lib/hadoop-auth-2.4.1.jar:
    /opt/mapr/lib/hadoop-common-2.4.1.jar:
    /opt/mapr/lib/commons-collections-3.2.1.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/common/lib/commons-configuration-1.6.jar:
    /opt/mapr/lib/commons-lang-2.5.jar:
    /opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-auth-2.4.1-mapr-1408.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-common-2.4.1-mapr-1408.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-api-2.4.1-mapr-1408.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.4.1-mapr-1408.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.4.1-mapr-1408.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.4.1-mapr-1408.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-client-2.4.1-mapr-1408.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.4.1-mapr-1408.jar:
    /opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-server-common-2.4.1-mapr-1408.jar:
    /opt/mapr/hadoop/hadoop-0.20.2/lib/guava-13.0.1.jar:
    /opt/mapr/hbase/hbase-0.94.24/hbase-0.94.24-mapr-1501.jar:
    /opt/mapr/hbase/hbase-0.94.24/conf/:
    /opt/mapr/hive/hive-0.12/lib/hive-exec-0.12-mapr-1501.jar:
    /opt/mapr/hive/hive-0.12/lib/hive-metastore-0.12-mapr-1501.jar:
    /opt/mapr/hive/hive-0.12/lib/antlr-runtime-3.4.jar:
    /opt/mapr/hive/hive-0.12/lib/libfb303-0.9.0.jar:
    /opt/mapr/hive/hive-0.12/lib/hive-common-0.12-mapr-1501.jar:
    /opt/mapr/hive/hive-0.12/lib/hive-hbase-handler-0.12-mapr-1501.jar:
    
    spark.executor.memory              2g
    spark.eventLog.enabled             true
    spark.eventLog.dir                 hdfs:///apps/spark
    spark.logConf                      true
    spark.driver.memory                2g

I have added Hbase and Hive jars in order to be able to work with them. I have imported them in spark-env.sh script as well.

Thank you very much in advance!

Outcomes