AnsweredAssumed Answered

Copying files from S3 to maprfs on Amazon EMR

Question asked by dave_kincaid on Dec 11, 2012
Latest reply on Dec 11, 2012 by dave_kincaid
Does anyone know if there is a problem using Amazon's S3Distcp tool with MapR running on EMR? I'm trying to use it, but keep getting the following exception in /mnt/var/log/hadoop/steps:

> Exception in thread "main"
> java.lang.RuntimeException: Unable to
> delete directory
> hdfs:/tmp/e9333a37-f400-4982-9687-326e33d9b37d/files
> at
> com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.deleteRecursive(S3DistCp.java:606)
> at
> com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:464)
> at
> com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:216)
> at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at
> com.amazon.external.elasticmapreduce.s3distcp.Main.main(Main.java:12)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at
> java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: java.io.IOException:
> Incomplete HDFS URI, no host:
> hdfs:/tmp/e9333a37-f400-4982-9687-326e33d9b37d/files
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:85)
> at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1416)
> at
> org.apache.hadoop.fs.FileSystem.access$100(FileSystem.java:69)
> at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1450)
> at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1432)
> at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:232)
> at
> com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.deleteRecursive(S3DistCp.java:603)

the command line I'm using to submit the job step is:

    elastic-mapreduce --jobflow $JOB_ID --jar s3://us-east-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar \
    --args '--src,s3n://PVData/raw, \
    --dest,/PVData/raw'

For the --dest argument I have tried maprfs:///PVData/raw and hdfs:///PVData/raw as well and they don't work either.

Outcomes