AnsweredAssumed Answered

MAPR HDFS file open and read seems to behave differently than other Hadoop distributions

Question asked by jacobson on Jan 29, 2016
Latest reply on Feb 10, 2016 by jacobson
I am working on a C++ project which runs on many different Hadoop distributions, Apache, HDP, CDH, and IOP.  We've tested the project on all current versions of all these distributions and are now expanding to MAPR.  We only see the following behavior on MAPR.

We have 2 processes. Process 1 opens an hdfs file, writes some data to it, closes it, and then pings the other process to let it know the file is ready. Process 2 opens the file and reads it. On all other platforms the read returns the number of bytes in the file. On MAPR the read returns 0 bytes meaning EOF. On MAPR it will only read the file correctly if I first do a SEEK_SET to 0 before the read.

Our application uses libhdfs.so to interact with HDFS files. I took libhdfs.so from an Apache 2.7.1 cluster where the above scenario works and replaced the MAPR version on libhdfs.so, but this made no difference, so it seems the behavior is not in libhdfs.

Is this expected behavior on MAPR that I need to deal with?
My expectation is that on opening an HDFS file that is will be seeked to the beginning of the file.

Outcomes