Spark-YARN: how to get rid of yarn-client NoClassDefFoundError

Document created by rsingh on Feb 13, 2016
Version 1Show Document
  • View in full screen mode

Author: Rajkumar Singh

 

Original Publication Date: April 29, 2015

 

Enviornment to reproduce

[root@ip-10-0-0-156 ~]# rpm -qa | grep mapr

mapr-mapreduce2-2.5.1.29870.GA-1.x86_64

mapr-cldb-4.0.2.29870.GA-1.x86_64

mapr-zk-internal-4.0.2.29870.GA.v3.4.5-1.x86_64

mapr-core-internal-4.0.2.29870.GA-1.x86_64

mapr-mapreduce1-0.20.2.29870.GA-1.x86_64

mapr-core-4.0.2.29870.GA-1.x86_64

mapr-tasktracker-4.0.2.29870.GA-1.x86_64

mapr-nfs-4.0.2.29870.GA-1.x86_64

mapr-nodemanager-2.5.1.29870.GA-1.x86_64

mapr-spark-historyserver-1.2.1.201503051824-1.noarch

mapr-hadoop-core-2.5.1.29870.GA-1.x86_64

mapr-fileserver-4.0.2.29870.GA-1.x86_64

mapr-zookeeper-4.0.2.29870.GA-1.x86_64

mapr-hiveserver2-0.13.201503021511-1.noarch

mapr-spark-1.2.1.201503051824-1.noarch

To test your spark-1.2.1 installation run the following command

MASTER=yarn-client /opt/mapr/spark/spark-1.2.1/bin/run-example org.apache.spark.examples.SparkPi

 

Which will be result into the NoClassDefFoundError

 

[root@ip-10-0-0-156 spark-1.2.1]# MASTER=yarn-client /opt/mapr/spark/spark-1.2.1/bin/run-example org.apache.spark.examples.SparkPi

Spark assembly has been built with Hive, including Datanucleus jars on classpath

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/mapr/spark/spark-1.2.1/lib/spark-assembly-1.2.1-hadoop2.5.1-mapr-1501.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/conf/YarnConfiguration

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:191)

at org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:207)

at org.apache.spark.deploy.SparkHadoopUtil$.<init>(SparkHadoopUtil.scala:206)

at org.apache.spark.deploy.SparkHadoopUtil$.<clinit>(SparkHadoopUtil.scala)

at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1873)

at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:105)

at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:180)

at org.apache.spark.SparkEnv$.create(SparkEnv.scala:308)

at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:159)

at org.apache.spark.SparkContext.<init>(SparkContext.scala:240)

at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28)

at org.apache.spark.examples.SparkPi.main(SparkPi.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

But if you look into the yarn classpath and query the jar available in the

/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/yarn/hadoop-yarn-api-2.5.1-mapr-1501.jar then you will find that that YarnConfiguration.class is available there.

[root@ip-10-0-0-156 ~]# yarn classpath

/opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/yarn/lib/*

Query the jar to find the class

[root@ip-10-0-0-156 yarn]# jar tf hadoop-yarn-api-2.5.1-mapr-1501.jar | grep YarnConfiguration 
org/apache/hadoop/yarn/conf/YarnConfiguration.class
org/apache/hadoop/yarn/conf/DefaultYarnConfiguration.class

Thats conclude that spark is not able to pick the the right class path.

 

Resolution

 

Add the following lines in the spark-env.sh and you are good to go

 

MAPR_YARN_CLASSPATH=`yarn classpath` 
MAPR_SPARK_CLASSPATH="$MAPR_HADOOP_CLASSPATH:$MAPR_HADOOP_HBASE_VERSION:$MAPR_YARN_CLASSPATH"

 

 

Attachments

    Outcomes