AnsweredAssumed Answered

shark / spark over mfs

Question asked by dimamah on Aug 21, 2013
Latest reply on Aug 22, 2013 by davidtucker
I suspect that shark / spark won't work as is on MapR's distribution due to the MFS. 
How would you recommend using shark / spark with MapR's dist?


**EDIT :** 
The way stated in the link in the answer below didn't work for me because some minor aspects where missing. 
What did work is : 

First, If installing shark 0.7 we must use Hive 0.9 (currently only shark 0.8 supports Hive 0.10 and none supports Hive 0.11  [The relevant Jira][1]

After installing on the cluster , as described [here][2]  
The following has to be added to `spark-env.sh` :

    SPARK_JAVA_OPTS="-Dspark.local.dir=/tmp " 
    SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 " 
    SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps " 
    SPARK_JAVA_OPTS+="-Djava.library.path=/opt/mapr/lib "
    export SPARK_JAVA_OPTS

    export SPARK_LIBRARY_PATH=/opt/mapr/lib 

    export SPARK_CLASSPATH=/opt/mapr/hadoop/hadoop-0.20.2/conf:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/commons-logging-1.0.4.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/maprfs-0.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/zookeeper-3.3.2.jar:/root/installs/shark-0.7.0/target/scala-2.9.3/shark_2.9.3-0.7.0.jar:/opt/mapr/hive/hive-0.10.0/lib/*

Especially important is the maprfs jar and all the jars in lib under hive home.





  [1]: https://spark-project.atlassian.net/browse/SHARK-166
  [2]: https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster

Outcomes