Build Spark Binaries On MapR 4.0.1 (Yarn)

Document created by wade on Feb 27, 2016
Version 1Show Document
  • View in full screen mode

Author: Jitendra Yadav, last modified by Hao Zhu on 2/18/2015

Original Publication Date: February 4, 2015


Below are the build steps through that we can build spark binaries from source on MapR platform.

1. Download spark 1.2.0 source code from Apache site.


2. Extract the .tgz file in current directory

tar  zxvf spark-1.2.0.tgz

3. Change location of the binary directory to /opt/mapr/spark.

mv spark-1.2.0 /opt/mapr/spark/ 
cd /opt/mapr/spark/spark-1.2.0/

4. Set environment variable SPARK_HOME and MAVEN_OPTS.

export SPARK_HOME=/opt/mapr/spark/spark-1.2.0 
export MAVEN_OPTS="-Xmx2048M"

5. Check your MapR yarn version for appropriate artifacts.

cat /opt/mapr/conf/hadoop_version|grep -i yarn


Once you get above output, you can match it with currently listed artifact on MapR maven repository(check below url).


For example:

mvn -Pyarn -Dhadoop.version=2.4.1-mapr-1408 -Dyarn.version=2.4.1-mapr-1408 -DskipTests package

Since we are using Hadoop 2.4.1 release therefore we used 2.4.1-mapr-1408 artifacts.

6.Set environment variable SPARK_HADOOP_VERSION and SPARK_YARN.

export SPARK_HADOOP_VERSION=2.4.1-mapr-1408 
export SPARK_YARN=true

7. Create Spark assembly.

sbt/sbt assembly

8. Verify newly created spark assembly and example jars inside below directory.


9. Run a Pi example job to test Spark on YARN.

./bin/spark-submit --class org.apache.spark.examples.SparkPi    --master yarn-cluster  --num-executors 3 --driver-memory 512m  --executor-memory 512m   --executor-cores 1  lib/spark-examples*.jar 10

After running above job, make sure spark app is listed on resource manager UI page.