How to Use Spark via Livy Interpreter with Zeppelin

Document created by Rachel Silver Employee on Jun 20, 2017Last modified by Rachel Silver Employee on Aug 24, 2017
Version 4Show Document
  • View in full screen mode



Apache Zeppelin is a web-based notebook project that enables interactive data analytics, particularly useful for Apache Spark workloads. While Apache Zeppelin has a native Spark Interpreter, MapR recommends using Livy for Apache Spark instead, so you can leverage some enhancements, such as:

  • The ability to submit jobs in YARN-cluster mode
  • Impersonation
  • Dynamic Memory Allocation controls


The versions used for this demo are:


Note: Zeppelin for MapR is not formally supported. Any problems should be addressed in Answers or in the Zeppelin Community.


Installing Zeppelin 

For these purposes, we're going to use the newest binary package, available here:

Apache: Zeppelin: Download Page and install Zeppelin to /opt/zeppelin.


Get and unpack the Zeppelin binary as a user with sudo access (use the one with all interpreters):


mkdir -p /opt/zeppelin

wget <link to suggested mirror>.tgz  -P /tmp/

gunzip /tmp/zeppelin-<version>-bin-all.tgz

tar -xf /tmp/zeppelin-<version>-bin-all.tar -C /opt/zeppelin/


Change the owner of these files to your MapR cluster user; we'll use 'mapr' for these purposes:


chown -R mapr:mapr /opt/zeppelin


Note: do the rest as your MapR cluster user.

su mapr


Check to see if port 8080 is open (default Zeppelin port). If it's not, here's how you can change it.


First, create a Zeppelin environment configuration file:

cp /opt/zeppelin/zeppelin-<version>-bin-all/conf/ /opt/zeppelin/zeppelin-<version>-bin-all/conf/


Open this file in a text editor and add the following to change the default port:

export ZEPPELIN_PORT=<Your Port #>                       


Start Zeppelin:

/opt/zeppelin/zeppelin-<version>-bin-all/bin/ start

Log dir doesn't exist, create /opt/zeppelin/zeppelin-<version>-bin-all/logs

Pid dir doesn't exist, create /opt/zeppelin/zeppelin-<version>-bin-all/run

Zeppelin start                                             [  OK  ]


Check to see that Zeppelin is up and running by visiting the Zeppelin Web UI at the port you specified above:

http://<Hostname or IP>:<Your Port #>         



Installing & Running Livy


Do the following as 'root' or a user with sudo permissions to install Livy:


mkdir -p /opt/livy

wget  -P /tmp

unzip /tmp/ -d /opt/livy/

mkdir /var/log/livy

chown mapr:mapr /var/log/livy

chown -R mapr:mapr /opt/livy

su mapr


Go into the Livy configuration file with a text editor of your choice and set the following values: 


file: /opt/livy/livy-server-0.3.0/conf/

export SPARK_HOME=/opt/mapr/spark/spark-2.1.0/
export HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/
export LIVY_LOG_DIR=/var/log/livy



And, to configure impersonation, please add the following to /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/core-site.xml:




Then, start Livy with this command:





Once launched, it will provide you with a URL to the REST service. You can visit this page and make sure everything is running and accessible. There won't be much there, but you should see "Operational Menu" as a heading.



Configure Zeppelin for Livy


There are many values that can be set here to control dynamic memory allocation and other enhancements that Livy is able to leverage. But the only one that must be set is:





Full configuration details can be found in the Apache Zeppelin Documentation.


To test that it is working, create a new note and try creating sessions using both Scala and PySpark, like so:




Further Reading