How to Query Drill using Zeppelin via JDBC

Document created by Rachel Silver Employee on Feb 9, 2017Last modified by Rachel Silver Employee on Jun 19, 2017
Version 6Show Document
  • View in full screen mode

Introduction

 

 *Seems to be a bug with 0.7.2. Avoid this version for now if you plan to use Drill.

 

Apache Zeppelin is a web-based notebook project that enables interactive data analytics. Recently, Apache Zeppelin 0.7.2 was released, so we'd like assist our customers in getting Zeppelin up and running on the MapR Platform. Here, we're going to explain how to get Zeppelin working on the MapR Converged Data Platform with Apache Spark and walk through a quick example.

 

The versions used for this demo are:

 

Note: Zeppelin for MapR is not formally supported. Any problems should be addressed in Answers or in the Zeppelin Community.

 

Installing Zeppelin 

For these purposes, we're going to use the newest binary package available here:

Apache: Zeppelin: Download Page and install Zeppelin to /opt/zeppelin.

 

Get and unpack the Zeppelin binary as a user with sudo access (use the one with all interpreters):

 

mkdir -p /opt/zeppelin

wget <link to suggested mirror>.tgz  -P /tmp/

gunzip /tmp/zeppelin-<version>-bin-all.tgz

tar -xf /tmp/zeppelin-<version>-bin-all.tar -C /opt/zeppelin/

 

Change the owner of these files to your MapR cluster user; we'll use 'mapr' for these purposes:

 

chown -R mapr:mapr /opt/zeppelin

 

Note: do the rest as your MapR cluster user.

su mapr

 

Check to see if port 8080 is open (default Zeppelin port). If it's not, here's how you can change it.

 

First, create a Zeppelin environment configuration file:

cp /opt/zeppelin/zeppelin-<version>-bin-all/conf/zeppelin-env.sh.template /opt/zeppelin/zeppelin-<version>-bin-all/conf/zeppelin-env.sh

 

Open this file in a text editor and add the following to change the default port:

export ZEPPELIN_PORT=<Your Port #>                       

 

Start Zeppelin:

/opt/zeppelin/zeppelin-<version>-bin-all/bin/zeppelin-daemon.sh start

Log dir doesn't exist, create /opt/zeppelin/zeppelin-<version>-bin-all/logs

Pid dir doesn't exist, create /opt/zeppelin/zeppelin-<version>-bin-all/run

Zeppelin start                                             [  OK  ]

 

Check to see that Zeppelin is up and running by visiting the Zeppelin Web UI at the port you specified above:

http://<Hostname or IP>:<Your Port #>         

 

Configure Zeppelin to Query Drill

First, we need to retrieve the ZooKeeper Quorum and Cluster ID from the $DRILL_HOME/conf/drill-override.conf configuration file:

 

grep cluster-id /opt/mapr/drill/drill-<version>/conf/drill-override.conf

  cluster-id: "<cluster-id>"

grep zk.connect /opt/mapr/drill/drill-<version>/conf/drill-override.conf

  zk.connect: "<host1>:5181,<host2>:5181,<host3>:5181"

 

Go to the Interpreter screen, as shown, and choose to +Create a new interpreter:

 

 

Create and fill in the fields as shown:

 

 

Field Value

Name

drill

Interpreter Group

jdbc

default.url

jdbc:drill:zk=<host1>:5181,<host2>:5181,<host3>:5181/drill/<cluster-id>

default.driver

org.apache.drill.jdbc.Driver

default.user

<cluster user>

default.password

<cluster user password>

artifact 

/opt/mapr/drill/drill-<version>/jars/jdbc-driver/drill-jdbc-all-<version>jar

 

 

 

 

 

 

Save the interpreter config, and then open a new notebook (or existing) and select the new interpreter you've created as your default:

 

 

Test Drill Query

Let's test it out using the steps from Drill In 10 Minutes, adapted for our notebook:

 

The query results shown below can be reproduced with:

SELECT * FROM cp.`employee.json` LIMIT 3;

 

 

There is a known issue with spin-up time on Drill jobs that can be solved by disabling unused Drill plugins or following the steps described by Arjun in the comments below.

 

 

Further Reading

Attachments

    Outcomes