AnsweredAssumed Answered

Mapr Spark 1.1.0.201411080956 compatability with Hive external metastore

Question asked by p3r3 on Feb 24, 2015
Latest reply on Feb 26, 2015 by p3r3
I cannot get the build of spark included in 'mapr-spark'  to play nice with an external hive metastore. It seems to not even make it to connecting to the JDBC metastore.

I followed the instructions here http://doc.mapr.com/display/MapR/Spark+1.1.0

I added the hive classes as instructed to spark-env and spark-defaults:

    SPARK_DAEMON_CLASSPATH=$SPARK_DAEMON_CLASSPATH::/opt/mapr/hive/hive-0.12/lib/hive-exec-0.12-mapr-1501.jar:/opt/mapr/hive/hive-0.12/lib/hive-metastore-0.12-mapr-1501.jar:/opt/mapr/hive/hive-0.12/lib/antlr-runtime-3.4.jar:/opt/mapr/hive/hive-0.12/lib/libfb303-0.9.0.jar:/opt/mapr/hive/hive-0.12/lib/hive-common-0.12-mapr-1501.jar:/opt/mapr/hive/hive-0.12/lib/hive-hbase-handler-0.12-mapr-1501.jar:/opt/mapr/hadoop/hadoop-0.20.2/conf:/opt/mapr/hadoop/hadoop-0.20.2/lib/commons-logging-1.0.4.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/guava-13.0.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/mapr-hbase-4.0.2-mapr.jar:/opt/mapr/lib/baseutils-4.0.2-mapr.jar:/opt/mapr/lib/commons-collections-3.2.1.jar:/opt/mapr/lib/commons-lang-2.5.jar:/opt/mapr/lib/hadoop-common-2.5.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/libprotodefs-4.0.2-mapr.jar:/opt/mapr/lib/maprfs-4.0.2-mapr.jar:/opt/mapr/lib/maprutil-4.0.2-mapr.jar:/opt/mapr/lib/protobuf-java-2.5.0.jar:
I also added mysql-connector-java-5.1.25-bin.jar to $SPARK_HOME/lib

    ubuntu@hadoop-mapr-01:/opt/mapr/spark/spark-1.1.0$ ls lib/
    mysql-connector-java-5.1.25-bin.jar  spark-assembly-1.1.0-hadoop2.4.1-mapr-1408.jar  spark-examples-1.1.0-hadoop2.4.1-mapr-1408.jar


I also added my hive-site.xml to /conf

   

     <configuration>
        <property><name>hive.exec.parallel</name><value>true</value></property>
          <property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://MYSERVER</value></property>
          <property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value></property>
          <property><name>javax.jdo.option.ConnectionUserName</name><value>hive</value></property>
          <property><name>javax.jdo.option.ConnectionPassword</name><value>PASSWORD</value></property>
          <property><name>datanucleus.autoCreateSchema</name><value>false</value></property>
          <property><name>datanucleus.fixedDatastore</name><value>true</value></property>
          <property><name>datanucleus.autoStartMechanism</name><value>SchemaTable</value></property>
          <property><name>hive.warehouse.subdir.inherit.perms</name><value>true</value></property>
          <property><name>hive.stats.ndv.error</name><value>5.0</value></property>
          <property><name>hive.stats.dbclass</name><value>jdbc:mysql</value></property>
          <property><name>hive.stats.jdbcdriver</name><value>com.mysql.jdbc.Driver</value></property>
          <property><name>hive.metastore.client.socket.timeout</name><value>3600</value></property>
          <property><name>hive.metastore.execute.setugi</name><value>true</value></property>
          <property><name>hive.stats.dbconnectionstring</name><value>jdbc:mysql://CONNECTIONSTRING</value></property>
          <property><name>hive.stats.autogather</name><value>true</value></property>
          <property><name>hive.default.rcfile.serde</name><value>org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe</value></property>
        </configuration>



Here is the error I am getting in spark-sql (hive works perfectly):
  
      ubuntu@hadoop-mapr-01:/opt/mapr/spark/spark-1.1.0$ ./bin/spark-sql
        Warning: Ignoring non-spark config property: export=SPARK_WORKER_MEMORY=16g
        Warning: Ignoring non-spark config property: SPARK_DAEMON_CLASSPATH=$SPARK_DAEMON_CLASSPATH::/opt/mapr/hive/hive-0.12/lib/hive-exec-0.12-mapr-1501.jar:/opt/mapr/hive/hive-0.12/lib/hive-metastore-0.12-mapr-1501.jar:/opt/mapr/hive/hive-0.12/lib/antlr-runtime-3.4.jar:/opt/mapr/hive/hive-0.12/lib/libfb303-0.9.0.jar:/opt/mapr/hive/hive-0.12/lib/hive-common-0.12-mapr-1501.jar:/opt/mapr/hive/hive-0.12/lib/hive-hbase-handler-0.12-mapr-1501.jar:/opt/mapr/hadoop/hadoop-0.20.2/conf:/opt/mapr/hadoop/hadoop-0.20.2/lib/commons-logging-1.0.4.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/guava-13.0.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/mapr-hbase-4.0.2-mapr.jar:/opt/mapr/lib/baseutils-4.0.2-mapr.jar:/opt/mapr/lib/commons-collections-3.2.1.jar:/opt/mapr/lib/commons-lang-2.5.jar:/opt/mapr/lib/hadoop-common-2.5.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/libprotodefs-4.0.2-mapr.jar:/opt/mapr/lib/maprfs-4.0.2-mapr.jar:/opt/mapr/lib/maprutil-4.0.2-mapr.jar:/opt/mapr/lib/protobuf-java-2.5.0.jar:
        Warning: Ignoring non-spark config property: export=SPARK_WORKER_MEMORY=16g
        Warning: Ignoring non-spark config property: SPARK_DAEMON_CLASSPATH=$SPARK_DAEMON_CLASSPATH::/opt/mapr/hive/hive-0.12/lib/hive-exec-0.12-mapr-1501.jar:/opt/mapr/hive/hive-0.12/lib/hive-metastore-0.12-mapr-1501.jar:/opt/mapr/hive/hive-0.12/lib/antlr-runtime-3.4.jar:/opt/mapr/hive/hive-0.12/lib/libfb303-0.9.0.jar:/opt/mapr/hive/hive-0.12/lib/hive-common-0.12-mapr-1501.jar:/opt/mapr/hive/hive-0.12/lib/hive-hbase-handler-0.12-mapr-1501.jar:/opt/mapr/hadoop/hadoop-0.20.2/conf:/opt/mapr/hadoop/hadoop-0.20.2/lib/commons-logging-1.0.4.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/guava-13.0.1.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/mapr-hbase-4.0.2-mapr.jar:/opt/mapr/lib/baseutils-4.0.2-mapr.jar:/opt/mapr/lib/commons-collections-3.2.1.jar:/opt/mapr/lib/commons-lang-2.5.jar:/opt/mapr/lib/hadoop-common-2.5.1.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/libprotodefs-4.0.2-mapr.jar:/opt/mapr/lib/maprfs-4.0.2-mapr.jar:/opt/mapr/lib/maprutil-4.0.2-mapr.jar:/opt/mapr/lib/protobuf-java-2.5.0.jar:
        WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
        
        Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.12/lib/hive-common-0.12-mapr-1501.jar!/hive-log4j.properties
        spark-sql> show tables;
        FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
        org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
         at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:302)
         at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:272)
         at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
         at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
         at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:38)
         at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360)
         at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360)
         at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
         at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:103)
         at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:98)
         at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:58)
         at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291)
         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
         at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
         at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:606)
         at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Outcomes