AnsweredAssumed Answered

s3distcp fails due to out of disk error

Question asked by terrys on Jun 21, 2013
Latest reply on Jul 9, 2013 by terrys
It appears Amazon's s3distcp is storing temporary output to /tmp/mapr-hadoop/local/taskTracker when uploading files from maprfs to s3.  This is leading to task failure due to an out of space error because /tmp fills up.  I'm assuming this is a defect in the s3distcp utility?  Is there any way to force the tool to use maprfs:///tmp for this temporary output?  Should I be overriding mapred.local.dir in the job?

I am passing the --tmpDir maprfs:///tmp/ to the job command.

Here is a sample path I'm seeing fill up with data and causing the error: /tmp/mapr-hadoop/mapred/local/taskTracker/root/jobcache/job_201306191524_0093/attempt_201306191524_0093_r_000004_0/work/${hadoop.tmp.dir}/s3

Outcomes