How-To: Using Kylin on MapR 5.1

Document created by Rachel Silver Employee on Jun 22, 2016Last modified by Rachel Silver Employee on Jan 3, 2017
Version 3Show Document
  • View in full screen mode

Note: there is an updated version of this here with steps for Kylin 1.5.4+

Overview: Why Kylin on MapR?

 

Apache Kylin™ is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, originally contributed from eBay Inc.

 

Apache Kylin™ lets you query big Hive tables at sub-second latency in 3 simple steps:

 

  1. Identify a set of Hive tables in star schema.
  2. Build a cube from the Hive tables in an offline batch process.
  3. Query the Hive tables using SQL and get results in sub-seconds, via Rest API, ODBC, or JDBC. (From Kylin docs)

 

After hearing significant interest from our customers, we worked with the Kylin support team to find a successful integration path. Kylin, post 1.5.1 release, will work out-of-the-box with MapR. In the meantime, you can follow the steps here for the specified versions and they should be sufficiently adaptable for older release issues.

 

Note: This article describes how to run Kylin on HBase, not using the HBase APIs to connect to MapR-DB.

 

Sample Environment and Version Information

 

For the purposes of this tutorial, the install was completed using 3 x m3.xlarge AWS instances. For steps on how to install MapR 5.1 using AWS, please refer to the blog by William Ochandarena: Spinning Up a Hadoop Cluster in the Cloud.

 

The relevant software versions that we will be working with are as follows:

 

 

Preparation Steps

 

Please set your $HCAT_HOME environment variable as shown, if it's not already set:

 

[mapr@ ~]$ echo $HCAT_HOME

 

[mapr@ ~]$ export HCAT_HOME=/opt/mapr/hive/hive-1.2/hcatalog/

[mapr@ ~]$ echo $HCAT_HOME

/opt/mapr/hive/hive-1.2/hcatalog/

 

 

Kylin Install Process

 

Update 5/26/16: Testing the 1.5.2 build. Should not require the patch noted below.

 

To begin, you'll have to retrieve the Kylin 1.5.1 for HBase 1.1.3 binary file and unzip it. It's important to have this directory be owned and accessible by a user with MapReduce job permissions (ex. 'mapr'):

 

[mapr@ ~]$ wget https://dist.apache.org/repos/dist/release/kylin/apache-kylin-1.5.1/apache-kylin-1.5.1-HBase1.1.3-bin.tar.gz

[mapr@ ~]$ tar -xzf apache-kylin-1.5.1-HBase1.1.3-bin.tar.gz

 

Next, you need to set $KYLIN_HOME to point to the new directory, for example:

 

     [mapr@ ~]$ export KYLIN_HOME=/home/mapr/apache-kylin-1.5.1-bin

 

Before running Kylin, we'll have to patch it. This will not be necessary in later releases, but has been provided by Kylin support to help with 1.5.1 and older versions to account for path differences in Hive versions. Please see this "diff" to identify the changes that you will have to make to $KYLIN_HOME/bin/find-hive-dependency.sh:

 

--- a/build/bin/find-hive-dependency.sh

+++ b/build/bin/find-hive-dependency.sh

@@ -69,10 +69,13 @@ if [ -z "$HCAT_HOME" ]

then

     echo "HCAT_HOME not found, try to find hcatalog path from hadoop home"

     hadoop_home=`echo $hive_exec_path | awk -F '/hive.*/lib/' '{print $1}'`

+    hive_home=`echo $hive_exec_path | awk -F '/lib/' '{print $1}'`

     if [ -d "${hadoop_home}/hive-hcatalog" ]; then

       hcatalog_home=${hadoop_home}/hive-hcatalog

     elif [ -d "${hadoop_home}/hive/hcatalog" ]; then

       hcatalog_home=${hadoop_home}/hive/hcatalog

+    elif [ -d "${hive_home}/hcatalog" ]; then

+      hcatalog_home=${hive_home}/hcatalog

     else

       echo "Couldn't locate hcatalog installation, please make sure it is installed and set HCAT_HOME to the path."

       exit 1

--

 

We've also attached a working version of the whole find-hive-dependency.sh file with the correct changes to this document. It should be relatively easy to adapt older versions using this but we do not recommend substituting the file in case there have been changes in the versioning.

 

Starting Kylin for the First Time

 

To start Kylin, run the following:

[mapr@ ~]$ $KYLIN_HOME/bin/kylin.sh start

On the first start, it may take a few minutes to create the initial Hive and HBase tables. When it's done, visit the Kylin Web UI by replacing <host> in this web address with your hostname for the server you've installed Kylin on:

 

http://<host>:7070/kylin

 

Log In with Username ADMIN and Password KYLIN as shown:

 

Screen Shot 2016-05-10 at 7.26.59 PM.png

 

 

Building a Sample Cube

 

Once you've confirmed that you have access to the Kylin WebUI, you can load the provide sample data by running the following (taken from Kylin docs):

 

[mapr@ ~]$ $KYLIN_HOME/bin/sample.sh

KYLIN_HOME is set to /home/mapr/apache-kylin-1.5.1-bin

Going to create sample tables in hive...

...

Sample cube is created successfully in project 'learn_kylin'; Restart Kylin server or reload the metadata from web UI to see the change.

To restart Kylin, please run the following and then log into the WebUI again to continue:

 

[mapr@ ~]$ $KYLIN_HOME/bin/kylin.sh stop

[mapr@ ~]$ $KYLIN_HOME/bin/kylin.sh start

 

In the WebUI, select "learn_kylin” from the project drop-down list:

 

Screen Shot 2016-05-11 at 3.00.55 PM.png

 

Select "build" from the Action/s menu for the kylin_sales_cube and then set the end date to today to load the entire data set (10,000 records):

 

Screen Shot 2016-05-11 at 3.02.39 PM.png

 

You can follow the progress of this build process in the Monitor tab. When it reaches 100%, we can move on to running a sample query.

 

Screen Shot 2016-05-11 at 4.24.00 PM.png

 

 

Queries are run from the Insight tab. Below is a test query with expected results that you can run:

 

select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt

 

Screen Shot 2016-05-11 at 4.25.33 PM.png

 

Screen Shot 2016-05-16 at 1.26.30 PM.png

 

 

 

Possible Issues

 

Problems Querying the Resource Manager: error check status

If your build fails at some step in the process but you can see in the Resource Manager that this step/job completed, it's possible that Kylin isn't able to reach the Resource Manager to query the job status. The error in $KYLIN_HOME/logs/kylin.log will look something like this:

 

2016-05-11 19:20:49,957 INFO  [pool-2-thread-3] execution.AbstractExecutable:218 : kylin.job.yarn.app.rest.check.status.url is not set, read from job configuration

2016-05-11 19:20:49,957 INFO  [pool-2-thread-3] execution.AbstractExecutable:234 : yarn.resourcemanager.webapp.address:http://0.0.0.0:8088

[...]

2016-05-11 19:20:49,966 ERROR [pool-2-thread-3] common.HadoopStatusChecker:93 : error check status

java.net.ConnectException: Connection refused

 

This problem is noted, and a build patch supplied in KYLIN-1319. But, steps are also provided for a configuration patch for this issue here. To manually set how Kylin finds your Resource Manager, add the following to $KYLIN_HOME/conf/kylin.properties:

 

kylin.job.yarn.app.rest.check.status.url=http://<YOUR RM ADDRESS>:8088/ws/v1/cluster/apps/${job_id}

 

Note: this is not recommended for High Availability situations. Please watch the noted JIRA for resolution.

 

Coprocessor Support: java.lang.UnsupportedOperationException: coprocessorService is not supported for MapR

 

If you have a query failing to run, and you see something similar to the error below, you are running Kylin on MapR-DB tables using the HBase API.

java.lang.RuntimeException: Error when visiting cubes by endpoint:

  at org.apache.kylin.storage.hbase.cube.v2.CubeHBaseEndpointRPC$1.run(CubeHBaseEndpointRPC.java:324)

  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

  at java.util.concurrent.FutureTask.run(FutureTask.java:266)

  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

  at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.UnsupportedOperationException: coprocessorService is not supported for MapR.

Unfortunately, MapR-DB does not support coprocessors, so this will not work. Fortunately, MapR supports HBase in standalone mode as well. So, first you'll have to remove the mappings that are set to map your Kylin tables to HBase (or any wildcard mappings) from /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/core-site.xml (will need to be root or a sudoer to do this).

 

They will look something like this:

 

<property>

    <name>hbase.table.namespace.mappings</name>

    <value>kylin_metadata:/kylin/tables,kylin_metadata_acl:/kylin/tables,kylin_metadata_user:/kylin/tables,*:/hbase</value>

</property>

 

Links to Further Information

Attachments

Outcomes