Installing Apache Zeppelin on MapR 5 with Apache Drill

Document created by cmatta Employee on Mar 14, 2016Last modified by Rachel Silver on Mar 2, 2017
Version 9Show Document
  • View in full screen mode

Note: the most updated version of this can be found here: How to Query Drill using Zeppelin via JDBC 

******************************************************************************

Apache Zeppelin is a big data notebook that allows end users to explore their data, rapidly prototype applications, and build visualizations, and dashboards in a web browser.

Download, build and install

Make sure the following prerequisites are fulfilled:

sudo apt-get update
sudo apt-get install git
sudo apt-get install openjdk-7-jdk
sudo apt-get install npm
sudo apt-get install libfontconfig

# install maven
wget http://www.eu.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
sudo tar -zxf apache-maven-3.3.3-bin.tar.gz -C /usr/local/
sudo ln -s /usr/local/apache-maven-3.3.3/bin/mvn /usr/local/bin/mvn

Clone the repository locally:

$ git clone https://github.com/apache/incubator-zeppelin.git

Build:

mvn -Pmapr50 -Pyarn -Pbuild-distr -Pspark-1.5 -Phadoop-2.6 -Ppyspark package -DskipTests -B

The -PmaprXX profile argument supports the following entries for MapR versions:

  • -Pmapr3
  • -Pmapr40
  • -Pmapr41
  • -Pmapr50
  • -Pmapr51

You can run Zeppelin from inside your build dir or you can install your package into a custom location, the built package will be available in incubator-zeppelin/zeppelin-distribution/target/zeppelin-0.6.0-incubating-SNAPSHOT.tar.gz

Once Zeppelin is extracted to where it will run from, we need to configure it. Add the following fields to $ZEPPELIN_HOME/conf/zeppelin-env.sh:

  • export SPARK_HOME=/opt/mapr/spark/spark-1.5.2
  • export HADOOP_HOME=/opt/mapr/hadoop/hadoop-2.7.0
  • export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

You can now start Zeppelin with the following command:

$ bin/zeppelin-daemon.sh start

Connecting to Apache Drill

Now that Zeppelin is up and running let's configure it to connect to an Apache Drill cluster.

In the Interpreter tab edit the existing jdbc interpreter box, or create a new one:

1yufxwv4.png

The JDBC interpreters allow you to specify multiple "profiles" so that you can have multiple connections via the same JDBC interface. The prefix specified in the properties denotes the profile name. So, for instance if you want drill and postgres connections you create drill.driver, drill.url, drill.user, drill.password and postgres.driver, postgres.url, postgres.user, postgres.password properties (and any other properties your jdbc requires) and call them from the notebook like this: %jdbc(drill) and %jdbc(postgres).

Edit the JDBC interpreter and add the following to the fields:

  • drill.url: jdbc:drill:zk=<zkhost:port>,<zkhost:port>,<zkhost:port>/drill/<cluster-id>
  • the <cluster-id> can be found in the $DRILL_HOME/conf/drill-override.conf file
  • drill.user: username
  • drill.password: password

In the Dependencies section tell Zeppelin where to find the JDBC driver:

  • Either use artifact IDs or a path to the jar:
    • org.apache.drill.exec:drill-jdbc:1.4.0
    • /opt/mapr/drill/drill-1.4.0/jars/jdbc-driver/drill-jdbc-all-1.4.0.jar

 

Save the interpreter config, and then open a new notebook (or existing) and make sure that the new interpreter is available in the list of interpreters under settings for the notebook (click the gear in the upper-right):

2na2yijd.png

Note: Since we added drill-specific prefixes to the interpreter the interpreter prefix in each paragraph needs to be %jdbc(drill):

_j7o-dei.png

Enjoy!

6 people found this helpful

Attachments

    Outcomes