AnsweredAssumed Answered

Wrong results from HiveServer considering Path to be NULL

Question asked by dimamah on Jan 8, 2014
Latest reply on May 6, 2014 by dimamah
Hi, 
We are experiencing what seems to be bug in HiveServer. 
Querying a table with rows returns nothing. 
This is done using JDBC from a specific instance of HiveServer on a specific port. 
Restarting the port (killing and it restarting) solves the problem. 
It is very hard to spot as no error is thrown and the results are just wrong. 

For example :
`create table sometable as select * from somedb.sometable` returns nothing when obviously there are rows in `somedb.table` as verified in nfs / hive cli / other port of HiveServer. 
The logs show : 

    create table sometable as select * from somedb.sometable
    Total MapReduce jobs = 3
    14/01/04 16:23:17 INFO ql.Driver: Total MapReduce jobs = 3
    14/01/04 16:23:17 INFO ql.Driver: </PERFLOG method=TimeToSubmit start=1388845397106 end=1388845397165 duration=59>
    Launching Job 1 out of 3
    14/01/04 16:23:17 INFO ql.Driver: Launching Job 1 out of 3
    14/01/04 16:23:17 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
    Number of reduce tasks is set to 0 since there's no reduce operator
    14/01/04 16:23:17 INFO exec.Task: Number of reduce tasks is set to 0 since there's no reduce operator
    14/01/04 16:23:17 INFO exec.ExecDriver: Using org.apache.hadoop.hive.ql.io.HiveInputFormat
    14/01/04 16:23:17 INFO exec.ExecDriver: adding libjars: file:///home/hadoop/general/pursway_hive_udf.jar,file:///home/hadoop/ohad/ohadUDF.jar,file:///home/hadoop/general/commons-lang-2.5.jar,file:///opt/mapr/hive/hive-0.10/lib/hive-builtins-0.10-mapr.jar
    14/01/04 16:23:17 INFO exec.ExecDriver: Processing alias sometable
    14/01/04 16:23:17 INFO exec.ExecDriver: Changed input file to maprfs:/tmp/hive-hadoop/hive_2014-01-04_16-23-17_106_1845549482049797816/-mr-10003/1
    14/01/04 16:23:17 INFO exec.ExecDriver: Making Temp Directory: maprfs:/tmp/hive-hadoop/hive_2014-01-04_16-23-17_106_1845549482049797816/-ext-10002
    14/01/04 16:23:17 INFO fs.JobTrackerWatcher: Current running JobTracker is: joya-02/106.103.1.82:9001
    14/01/04 16:23:17 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    14/01/04 16:23:17 INFO mapred.FileInputFormat: Total input paths to process : 1
    14/01/04 16:23:17 INFO io.HiveInputFormat: number of splits 1
    Starting Job = job_201312251757_4532, Tracking URL = http://joya-02:50030/jobdetails.jsp?jobid=job_201312251757_4532
    14/01/04 16:23:18 INFO exec.Task: Starting Job = job_201312251757_4532, Tracking URL = http://joya-02:50030/jobdetails.jsp?jobid=job_201312251757_4532
    Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job  -kill job_201312251757_4532
    14/01/04 16:23:18 INFO exec.Task: Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job  -kill job_201312251757_4532
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    14/01/04 16:23:19 INFO exec.Task: Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    2014-01-04 16:23:19,014 Stage-1 map = 0%,  reduce = 0%
    14/01/04 16:23:19 INFO exec.Task: 2014-01-04 16:23:19,014 Stage-1 map = 0%,  reduce = 0%
    2014-01-04 16:23:23,022 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.24 sec
    14/01/04 16:23:23 INFO exec.Task: 2014-01-04 16:23:23,022 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.24 sec
    MapReduce Total cumulative CPU time: 1 seconds 240 msec
    14/01/04 16:23:23 INFO exec.Task: MapReduce Total cumulative CPU time: 1 seconds 240 msec
    Ended Job = job_201312251757_4532

As you can see there is a line saying : `Changed input file to maprfs:/tmp/hive-hadoop/hive_2014-01-02_11-57-03_178_1304509914051509759/-mr-10002/1`

This happens in `ExecDriver.java` in the function `addInputPath`. `addInputPath` is being called from `addInputPaths` and we don't see logging of `Adding input file` which leads us to believe that this line : `for (String onefile : work.getPathToAliases().keySet()) {` isn't executed as nothing is returned from work.getPathToAliases().keySet()

We changed the code for testing and verified that it returns nothing.
This, as stated below in the code should only happen "If the query references non-existent partitions"
but the partitions are that, exist in metastore, reachable by any other way etc..

So, we are getting to this line : 
`if (path == null) {
        numEmptyPaths = addInputPath(null, job, work, hiveScratchDir, numEmptyPaths, false,oneAlias);`

and the rest is known...

This behaviour is observed when running a query while the partiiton pruner find no partitions. emptyFile is generated and so on. 
 

We have currently 7 queries acting this way on two different occasions.
In the 2nd occasion, In the same port, some queries where running fine while other (2 queries) showed the strange behavior. 
We manually restarted the ports, reconnected, issued the same connections hooks and the queries ran just fine.   
As you can probably understand we are unable to reproduce the problem but if we wait long enough we thing it will happen again. 
We never had problems like this until upgrading from V1.2.9 HiveV0.7.1 to V3.0.1 HiveV0.10 - our current environment.

The other SQLs are inserts / selects / CTAS.
We think this is a bug in the HiveServer and not in MapReduce (Which holds the `work` object from above) As restarting the HiveServer solves the problem. 

We see this as a very very serious problem as we have no indication of it.

Can you think of any way to solve this?

Thank you, 
Dima.


Outcomes