AnsweredAssumed Answered

Race Condition: Streaming Job Log Files Not Appearing In Time

Question asked by fistan684 on Jan 31, 2012
Latest reply on Jan 31, 2012 by fistan684
We recently ported over a streaming job to MapR. The streaming job is invoked by calling ToolRunner.run() inside of a Groovy script that is responsible for setting up the streaming job's arguments. Right after the streaming job completes we examine the history file located under the output directory (<output>/_logs/history/<history_file>.jar) and pull various metrics/counters from it and then log those counters for reporting purposes.

The issue we're running into is that about once out of every 40 or 50 runs the job fails because the history file isn't there when we try and read from it. That's not to say that it is never created, on the contrary - every time we look for the file after a failure it is there. What we think is happening is that periodically the file shows up late or at least isn't there when we look for it, thus causing an error. Has anyone encountered this type of race condition before? We had a hunch that it might be the way the new file system works (since there is no longer a dedicated name node) but wanted to run it by the community first for input.

Outcomes