What is native support for s3 means? I was reading that from Hadoop 2.7 onwards s3a is supported which is better than s3. Does it mean in MapR if I use just s3 in the output does it automatically use the s3a features?
Hi Deepak Subhramanian,
According to my research, s3 is in the output and it doesn't use the s3a features. Check out https://forums.aws.amazon.com/thread.jspa?threadID=225987.
I tried using distcp with s3 and I am getting the error that max size limit exceeded. So I guess it is using an old version as it is reaching the 5GB limit. s3a allows put for more that 5GB. So it looks like the thread is related to EMR.
Error Message from distcp.
Your proposed upload exceeds the maximum allowed size<
Deepak Subhramanian If the EMR is deployed with amazon EMR images then you won't be able to use s3a as this is not supported by EMRFS.
I am assuming you have deployed MAPR cluster on amazon and want to connect with your s3 bucket from the same nodes as well. If that is the case you need to follow below doc for s3a to configure the same.
Apache Hadoop Amazon Web Services support – Hadoop-AWS module: Integration with Amazon Web Services
Note: make sure you have correct jars in place and configuration needs to be added in "/opt/mapr/hadoop/hadoop-2.7.<vesion>/etc/hadoop/core-site.xml"
Thanks Shishir Prakash . I got it working with s3a already. In one of the earlier posts uffe replied that MapR support s3 natively and asked to use s3 instead of s3a. That is the reason I asked this question on that post. But for some reason this is branched out as a new question. I am not able to find the old post. It should have been referenced when the question is branched out as a new thread.
Please click on discussion to get to earlier discussion.
Want to check in to see if you have made any progress.
Please help with Deepak's additional question.
Just reread the other discussion where uffe referred s3 works natively in MapR. I guess uffe referred to the configuration parameter in core-site.xml and not the prefix on the directory in the distcp command. I assumed I just need to use prefix s3 in distcp command. It works with s3a prefix.
Uffes reply from the earlier discussion (Missing org.apache.hadoop.fs.s3a.S3AFileSystem - MapR 5.0 )
If not in MapRv5 you have native support for S3 so you only need to add detail for fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey (note s3. and not s3a. .....) in core-site.xml
Thanks Cathy. Just to confirm. The link is applicable to EMR. Is it the same for MapR ?
Shishir Prakash I could see that you'd contributed to this area to one of the queries that were raised internally in the past. Please see if you can clarify this for Deepak Subhramanian.
Deepak, if I got this right, you'll need to have the appropriate configuration for s3a explicitly applied in /opt/mapr/hadoop/hadoop-<version>/etc/hadoop/core-site.xml. Else it'd treat it as native S3, I'd believe.
Retrieving data ...