How to install and configure Apache tachyon on MapR cluster

Document created by wade on Feb 27, 2016
Version 1Show Document
  • View in full screen mode

Author: Jitendra Yadav, last modified by Hao Zhu on May 7, 2015

 

Original Publication Date: May 1, 2015

 

Environment

MapR 4.x

Goal

How to install and configure Apache tachyon on MapR cluster.

Solution

1. Download and build Tachyon binary.

$ git clone git://github.com/amplab/tachyon.git 
$ cd tachyon
$ mvn install

2. Start Tachyon

$ cp conf/tachyon-env.sh.template conf/tachyon-env.sh 
$ ./bin/tachyon format
$ ./bin/tachyon-start.sh local

3. Configure Spark to integrate with Tachyon.

Once you start Tachyon master and worker process then lets try to access Tachyon in-memory distributed storage through spark shell.

Note: If you are using spark then please follow below steps as an prerequisite.

In spark/conf/spark-env.sh

export SPARK_CLASSPATH=/pathToTachyon/client/target/tachyon-client-0.6.4-jar-with-dependencies.jar:$SPARK_CLASSPATH

In spark/conf/core-site.xml

<property> 
<name>fs.tachyon-ft.impl</name>
<value>tachyon.hadoop.TFS</value>
</property>

4. Read and write on MapR cluster.

$cd $SPARK_HOME 
$ ./bin/spark-shell
$ val s = sc.textFile("tachyon://localhost:19998/X")
$ s.count()
$ s.saveAsTextFile("tachyon://localhost:19998/Y")

NOTE: MapR Spark rpm's does not support Tachyon integration therefore we need to build latest spark 1.3 from source.

1 person found this helpful

Attachments

    Outcomes