Just to preface this... this is what happens when I make an 8 shot espresso iced latte and I have some free time on my hands... ;-)
I was curious about MapR File System locks. Since the FS is posix compliant, I was wondering what would happen if I held a file lock on a file or set of files in one app, while someone was attempting to run a hive query against the larger set.
Or if I created a file in the sub directory where a hive query was running... if I placed a lock on the file, would that cause the hive query to throw an exception or would it just ignore the file.
This is just one question in a series... overall I wanted to write a process that would let me compact files in a sub directory into a single file. The issue though is that in a multi-tenant environment, someone else may be running a query against the data set. Usually its not a problem since the files are immutable, so the isolation is a dirty read. However if I'm writing a process to compact the files... if I write it to the same directory, hive may barf. If I write to a temp directory... when I want to replace the files... boom. There's a small window of uncertainty where I will be removing the old while I copy in the new file.
I get the impression that Hive won't handle this well and its a hard thing to test...