AnsweredAssumed Answered

distcp copy to S3

Question asked by dave_kincaid on Dec 12, 2012
Latest reply on Dec 14, 2012 by dave_kincaid
I'm trying to use distcp to copy some files from our internal MapR cluster onto S3 so I can use them with Amazon EMR. I'm using the following command line:

    hadoop distcp -i -update /PVData/raw s3n://access_key:secret_key@PVData/incoming

I'm getting a lot of the following errors. Are these anything to worry about? Are these errors slowing down the copy?

> 12/12/12 20:34:08 INFO
> httpclient.HttpMethodDirector: I/O
> exception (java.net.ConnectException)
> caught when processing request:
> Connection refused
>
> 12/12/12 20:34:08 INFO
> httpclient.HttpMethodDirector:
> Retrying request
>
> 12/12/12 20:34:08 INFO
> httpclient.HttpMethodDirector: I/O
> exception (java.net.ConnectException)
> caught when processing request:
> Connection refused
>
> 12/12/12 20:34:08 INFO
> httpclient.HttpMethodDirector:
> Retrying request
>
> 12/12/12 20:34:08 INFO
> httpclient.HttpMethodDirector: I/O
> exception (java.net.ConnectException)
> caught when processing request:
> Connection refused
>
> 12/12/12 20:34:08 INFO
> httpclient.HttpMethodDirector:
> Retrying request
>
> 12/12/12 20:34:08 INFO
> metrics.MetricsUtil: Could NOT ping
> instance controller
>
> 12/12/12 20:34:08 INFO
> metrics.MetricsSaver: Wait for
> instance controller started to flush 9
> records

I'm also seeing a few of these inside the task logs that seem strange to me for file copy fails:

> 2012-12-12 20:34:06,896 INFO org.apache.hadoop.tools.DistCp: FAIL _distcp_logs_90ueow/part-00020 : java.io.FileNotFoundException: No such file or directory 's3n://keys:keys@PVData/incoming/_distcp_logs_90ueow/part 00020'
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:685)
> at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:472)
> at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:556)
> at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:317)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)

Outcomes