AnsweredAssumed Answered

Derby and MapR/Hive - Bug or misconfiguration?

Question asked by gabi_k on May 19, 2013
Latest reply on May 21, 2013 by gabi_k
i am using MySql as metastore.
when i am joining 2 big partitioned tables in hive i am getting 1 output row instead of 3 billion.
after little investigating i saw this on the logs:

    2013-05-19 16:16:02,940 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS maprfs:/user/hadoop/tmp/hive/hive_2013-05-19_12-31-42_449_4781220154914130577/_tmp.-ext-10000/000088_1

    2013-05-19 16:16:02,941 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS maprfs:/user/hadoop/tmp/hive/hive_2013-05-19_12-31-42_449_4781220154914130577/_task_tmp.-ext-10000/_tmp.000088_1
    2013-05-19 16:16:02,950 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS maprfs:/user/hadoop/tmp/hive/hive_2013-05-19_12-31-42_449_4781220154914130577/_tmp.-ext-10000/000088_1
    2013-05-19 16:16:02,985 INFO ExecReducer: ExecReducer: processed 3046471231 rows: used memory = 410427976
    2013-05-19 16:16:02,985 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing...
    2013-05-19 16:16:02,985 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows
    2013-05-19 16:16:24,311 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
    2013-05-19 16:16:24,311 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing...
    2013-05-19 16:16:24,311 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows
    2013-05-19 16:16:24,311 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing...
    2013-05-19 16:16:24,311 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows
    
    2013-05-19 16:16:24,326 ERROR org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during instantiating JDBC driver org.apache.derby.jdbc.EmbeddedDriver.
    java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver
                    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
                    at java.security.AccessController.doPrivileged(Native Method)
                    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
                    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
                    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
                    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
                    at java.lang.Class.forName0(Native Method)
                    at java.lang.Class.forName(Class.java:169)
                    at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:68)
                    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:843)
                    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:752)
                    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
                    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
                    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
                    at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:305)
                    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:509)
                    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)
                    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
                    at java.security.AccessController.doPrivileged(Native Method)
                    at javax.security.auth.Subject.doAs(Subject.java:396)
                    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
                    at org.apache.hadoop.mapred.Child.main(Child.java:264)
    
    2013-05-19 16:16:24,328 ERROR org.apache.hadoop.hive.ql.exec.FileSinkOperator: StatsPublishing error: cannot connect to database
    2013-05-19 16:16:24,328 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1
    2013-05-19 16:16:24,328 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
    2013-05-19 16:16:24,328 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done
    2013-05-19 16:16:24,332 INFO org.apache.hadoop.mapred.Task: Task:attempt_201305071944_0930_r_000088_1 is done. And is in the process of commiting
    2013-05-19 16:16:24,411 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201305071944_0930_r_000088_1' done.
    
    While in the regular one it looks like this :
    
    2013-05-20 15:30:13,870 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 1 rows
    2013-05-20 15:30:13,870 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding 1 rows
    2013-05-20 15:30:13,870 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS maprfs:/user/hadoop/tmp/hive/hive_2013-05-20_15-21-43_925_8527022279162962096/_tmp.-ext-10000/000001_1
    2013-05-20 15:30:13,870 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS maprfs:/user/hadoop/tmp/hive/hive_2013-05-20_15-21-43_925_8527022279162962096/_task_tmp.-ext-10000/_tmp.000001_1
    2013-05-20 15:30:13,875 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS maprfs:/user/hadoop/tmp/hive/hive_2013-05-20_15-21-43_925_8527022279162962096/_tmp.-ext-10000/000001_1
    2013-05-20 15:30:13,899 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 10 rows
    2013-05-20 15:30:13,899 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding 10 rows
    2013-05-20 15:30:13,903 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarding 100 rows
    2013-05-20 15:30:13,903 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding 100 rows

it is on reducers when they are doing the Join..

Anyone know what is it? why is it need Derby Driver?!

Thanks.

Outcomes