Note: This is an update to previous steps to address a bug found in Kylin versions 1.5.4-1.6.0.
Apache Kylin™ is an open source Distributed Analytics Engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, originally contributed from eBay Inc.
Apache Kylin™ lets you query big Hive tables at sub-second latency in three simple steps:
- Identify a set of Hive tables in star schema.
- Build a cube from the Hive tables in an offline batch process.
- Query the Hive tables using SQL and get results in sub-seconds, via Rest API, ODBC, or JDBC. (From Kylin docs)
After hearing significant interest from our customers, we worked with the Kylin support team to find a successful integration path.
Note: This article describes how to run Kylin on HBase, not using the HBase APIs to connect to MapR-DB.
Sample Environment and Version Information
The relevant software versions that we will be working with are as follows:
To begin, you'll have to retrieve the Kylin 1.6.0 for HBase 1.x binary file and unzip it. It's important to have this directory be owned and accessible by a user with MapReduce job permissions (ex. 'mapr'). The following directions will create a directory called /opt/kylin owned by the default cluster user and unzip the Kylin binary to that place:
mkdir -p /opt/kylin
tar -xzf /tmp/apache-kylin-1.6.0-hbase1.x-bin.tar.gz -C /opt/kylin
chown -R mapr:mapr /opt/kylin
Change to your MapR Cluster user:
Next, set the KYLIN_HOME variable to point to this location:
To address an issue related to the Apache Calcite version (KYLIN-2094), please delete all of the Kylin JDBC JAR files in the $KYLIN_HOME/lib directory before starting for the first time:
rm -r /opt/kylin/apache-kylin-1.6.0-hbase1.x-bin/lib/kylin-jdbc-*.jar
Starting Kylin for the First Time
To start Kylin, run the following as the cluster user:
On the first start, it may take a few minutes to create the initial Hive and HBase tables. When it's done, visit the Kylin Web UI by replacing <host> in this web address with your hostname for the server you've installed Kylin on:
Log In with Username ADMIN and Password KYLIN as shown:
Building a Sample Cube
Once you've confirmed that you have access to the Kylin WebUI, you can load the provided sample data by running the following (taken from Kylin docs):
[mapr@ip-172-31-15-151 root]$ $KYLIN_HOME/bin/sample.sh
KYLIN_HOME is set to /opt/kylin/apache-kylin-1.6.0-hbase1.x-bin
Going to create sample tables in hive
Sample cube is created successfully in project 'learn_kylin'.
Restart Kylin server or reload the metadata from web UI to see the change.
To restart Kylin, please run the following and then log into the WebUI again to continue:
In the WebUI, select "learn_kylin” from the project drop-down list:
Select "build" from the Action/s menu for the kylin_sales_cube and then set the end date to today to load the entire data set (10,000 records):
You can follow the progress of this build process in the Monitor tab. When it reaches 100%, we can move on to running a sample query.
Queries are run from the Insight tab. Below is a test query with expected results that you can run:
select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt