AnsweredAssumed Answered

Intermittent problems with Hive using BZip2

Question asked by fmdataservices on Aug 7, 2012
Latest reply on Aug 22, 2012 by fmdataservices
I have a process which does the following:
 1. loads bzip2 files into MapR FS using hadoop fs -copyFromLocal
 2. Does a LOAD from MapR FS into a Hive table using textfile format.
 3. Then a SELECT query counts the total rows in the table (SELECT count(1)...). 

This SELECT query sometimes fails with the following exception. The exception seems to indicate a corrupted bzip2 file, but when I re-run the above steps using the same files, the process succeeds.

It is stated in the documentation that files with certain extensions (bz2 and others) are not compressed and this is the case with the settings on this cluster.

----------

    java.io.IOException: unexpected end of stream at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.getAndMoveToFrontDecode(CBZip2InputStream.java:965) at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock(CBZip2InputStream.java:540) at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.setupNoRandPartA(CBZip2InputStream.java:1094) at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.setupNoRandPartB(CBZip2InputStream.java:1146) at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read0(CBZip2InputStream.java:453) at org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:406) at org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionInputStream.read(BZip2Codec.java:422) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:160) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:66) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:32) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:67) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:210) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:195) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:394) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109) at org.apache.hadoop.mapred.Child.main(Child.java:264)

Outcomes