How to Use Kylin on MapR 5.2

Document created by Rachel Silver Employee on Dec 18, 2016Last modified by aalvarez on Jan 25, 2017
Version 14Show Document
  • View in full screen mode

Introduction

Note: This is an update to previous steps to address a bug found in Kylin versions 1.5.4-1.6.0.

 

Apache Kylin™ is an open source Distributed Analytics Engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, originally contributed from eBay Inc.

 

Apache Kylin™ lets you query big Hive tables at sub-second latency in three simple steps:

 

  1. Identify a set of Hive tables in star schema.
  2. Build a cube from the Hive tables in an offline batch process.
  3. Query the Hive tables using SQL and get results in sub-seconds, via Rest API, ODBC, or JDBC. (From Kylin docs)

 

After hearing significant interest from our customers, we worked with the Kylin support team to find a successful integration path.

 

Note: This article describes how to run Kylin on HBase, not using the HBase APIs to connect to MapR-DB.

 

Sample Environment and Version Information

 

The relevant software versions that we will be working with are as follows:

 

 

Kylin Install

 

To begin, you'll have to retrieve the Kylin 1.6.0 for HBase 1.x binary file and unzip it. It's important to have this directory be owned and accessible by a user with MapReduce job permissions (ex. 'mapr'). The following directions will create a directory called /opt/kylin owned by the default cluster user and unzip the Kylin binary to that place:

 

mkdir -p /opt/kylin

wget -P /tmp/ http://mirrors.koehn.com/apache/kylin/apache-kylin-1.6.0/apache-kylin-1.6.0-hbase1.x-bin.tar.gz

tar -xzf /tmp/apache-kylin-1.6.0-hbase1.x-bin.tar.gz -C /opt/kylin

chown -R mapr:mapr /opt/kylin

 

Change to your MapR Cluster user:

 

      su mapr

Next, set the KYLIN_HOME variable to point to this location:

      export KYLIN_HOME=/opt/kylin/apache-kylin-1.6.0-hbase1.x-bin

 

To address an issue related to the Apache Calcite version (KYLIN-2094), please delete all of the Kylin JDBC JAR files in the $KYLIN_HOME/lib directory before starting for the first time:

 

   rm -r /opt/kylin/apache-kylin-1.6.0-hbase1.x-bin/lib/kylin-jdbc-*.jar

 

 

 

Starting Kylin for the First Time

 

To start Kylin, run the following as the cluster user:

$KYLIN_HOME/bin/kylin.sh start

On the first start, it may take a few minutes to create the initial Hive and HBase tables. When it's done, visit the Kylin Web UI by replacing <host> in this web address with your hostname for the server you've installed Kylin on:

 

http://<host>:7070/kylin

 

Log In with Username ADMIN and Password KYLIN as shown:

 

Screen Shot 2016-05-10 at 7.26.59 PM.png

 

Building a Sample Cube

 

Once you've confirmed that you have access to the Kylin WebUI, you can load the provided sample data by running the following (taken from Kylin docs):

 

$KYLIN_HOME/bin/sample.sh

[mapr@ip-172-31-15-151 root]$ $KYLIN_HOME/bin/sample.sh
KYLIN_HOME is set to /opt/kylin/apache-kylin-1.6.0-hbase1.x-bin
Going to create sample tables in hive

[...]

Sample cube is created successfully in project 'learn_kylin'.
Restart Kylin server or reload the metadata from web UI to see the change.

To restart Kylin, please run the following and then log into the WebUI again to continue:

 

$KYLIN_HOME/bin/kylin.sh stop

$KYLIN_HOME/bin/kylin.sh start

 

In the WebUI, select "learn_kylin” from the project drop-down list:

 

Screen Shot 2016-05-11 at 3.00.55 PM.png

 

Select "build" from the Action/s menu for the kylin_sales_cube and then set the end date to today to load the entire data set (10,000 records):

 

Screen Shot 2016-05-11 at 3.02.39 PM.png

 

You can follow the progress of this build process in the Monitor tab. When it reaches 100%, we can move on to running a sample query.

 

Screen Shot 2016-05-11 at 4.24.00 PM.png

 

 

Queries are run from the Insight tab. Below is a test query with expected results that you can run:

 

select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt

 

Screen Shot 2016-05-11 at 4.25.33 PM.png

 

Screen Shot 2016-05-16 at 1.26.30 PM.png

 

 

Links to Further Information

2 people found this helpful

Attachments

    Outcomes