Apache Zeppelin is a web-based notebook project that enables interactive data analytics. Recently, Apache Zeppelin 0.7.2 was released, so we'd like to assist our customers in getting Zeppelin up and running on the MapR Platform. Here, we're going to explain how to get Zeppelin working on the MapR Converged Data Platform with Apache Spark and walk through a quick example.
The versions used for this demo are:
Note: Zeppelin for MapR is not formally supported. Any problems should be addressed in or in the Zeppelin Community.
For these purposes, we're going to use the newest binary package available, here:
Apache: Zeppelin: Download Page and install Zeppelin to /opt/zeppelin.
Get and unpack the Zeppelin binary as a user with sudo access (use the one with all interpreters):
mkdir -p /opt/zeppelin
wget <link to suggested mirror>.tgz -P /tmp/
tar -xf /tmp/zeppelin-<version>-bin-all.tar -C /opt/zeppelin/
Change the owner of these files to your MapR cluster user; we'll use 'mapr' for these purposes:
chown -R mapr:mapr /opt/zeppelin
Note: do the rest as your MapR cluster user.
Check to see if port 8080 is open (default Zeppelin port). If it's not, here's how you can change it.
First, create a Zeppelin environment configuration file:
cp /opt/zeppelin/zeppelin-<version>-bin-all/conf/zeppelin-env.sh.template /opt/zeppelin/zeppelin-<version>-bin-all/conf/zeppelin-env.sh
Open this file in a text editor and add the following to change the default port:
export ZEPPELIN_PORT=<Your Port #>
Log dir doesn't exist, create /opt/zeppelin/zeppelin-<version>-bin-all/logs
Pid dir doesn't exist, create /opt/zeppelin/zeppelin-<version>-bin-all/run
Zeppelin start [ OK ]
Check to see that Zeppelin is up and running by visiting the Zeppelin Web UI at the port you specified above:
http://<Hostname or IP>:<Your Port #>
Configure Zeppelin to Query Hive
Zeppelin has deprecated the Hive Interpreter in favor of JDBC. In order to configure Zeppelin to query Hive over JDBC, follow these steps:
Go to the Interpreter screen, as shown, and choose to +Create a new interpreter:
Create and fill in the fields as shown:
<cluster user password>
Save the interpreter config, and then open a new notebook (or existing) and select the new interpreter you've created as your default:
Test Hive Query
Hive doesn't have a default dataset to play with but you can test that this connection is working as such: