AnsweredAssumed Answered

Volume Issues java.io.IOException: End Of FIle Exception,  File is corrupt! and maprfs:Filename.txt not a SequenceFile

Question asked by bmis2014 on Mar 3, 2015
Hi MapR Team,

We have job that write to a HDFS user directory location, and when MAPPER finish it commits FIle from its user directory to Volume based directory(Final Destination).

We use following to achieve this:  Source is user hdfs directory and target is volume based directory
FileSystem.get(job.getConfiguration()).rename(source, target);

http://doc.mapr.com/display/MapR/Managing+Data+with+Volumes


During the above steps, we have notice that Hive Dependent Job that reads (this volume data periodically) encounters:

1) Caused by: java.io.IOException: maprfs:somefile.txt not a SequenceFile

2) Caused by: java.io.EOFException  Unterminated or unclosed files.

    hadoop fs -text maprfs:File.txt | wc -l
    2015-02-22 19:05:04 INFO: org.apache.hadoop.io.compress.zlib.ZlibFactory - Successfully loaded & initialized native-zlib library
    2015-02-22 19:05:04 INFO: org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor
    text: null
    0

3) Caused by: java.io.IOException: File is corrupt!

      105        at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:1699)
      106        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2102)
      107        at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:76)
      108        at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)


Can you please suggest what is best way to move data from HDFS to volume and ensure that we do not run into #1,#2 and #3.


Thanks in advance for your help !

Based on in-house expert of MapR suggested following:

HDFD Directory/file ---copy--> Volume/tmp/file ----rename--> Volume/final/traget/dir

Also another use case is volume to volume transfer:
/user/volume1/tmp   ---copy--> /user/volume2/archival/yyyy-mm-dd/tmp --->

Let  me know if there is any MapR FS API or any thing that will work beside above or if there is no api (please add API that does this work behind the API call so it is seamless for user)

Thanks,

Bhavesh


Outcomes