AnsweredAssumed Answered

Including HBase dependencies in Spark

Question asked by anton on Dec 14, 2015
I'm developing POC application for HBase & Spark. Without Yarn - Spark seems to compete for resources with HBase and produce a lot of timeouts (please correct me if I'm wrong, I'm still starting)

 I've included this jar to make Spark run remotely on Yarn

    /opt/mapr/asynchbase/asynchbase-1.6.0/asynchbase-1.6.0-mapr-1504.jar

the problem I've encountered was about "RegionClient" class can not be loaded. I've tried every possible way of passing that jar into the application, but the only way it was working is when I was specifying the jar in the "--driver-classpath". But that was only working locally.

When digging deeper I've found this script

    /opt/mapr/spark/spark-1.4.1/conf/spark-env.sh

which after some internal calls was trying to add following jars to the classpath

    /opt/mapr/hbase/hbase-1.4.1/lib/hbase-*.jar


while the file that were needed to be added were

    /opt/mapr/hbase/hbase-0.98.12/lib/hbase-common-0.98.12-mapr-1506.jar

and

    /opt/mapr/asynchbase/asynchbase-1.6.0/asynchbase-1.6.0-mapr-1504.jar
I've hardcoded needed files into the script and Spark on Yarn is running fine and talk to HBase asynchronously as desired!

Could you please confirm that this is indeed a problem in the system, or if I'm doing something wrong?

Kind Regards,
Anton





Outcomes