We have a single node non-secure MapR cluster that we are trying to access through HDFS API using the following code snippet.
String uri = "hdfs://xx.xx.xx.xx:7222/tmp/gbDir/gbFile.txt";
Path path = new Path(uri);
Configuration configuration = new Configuration();
FileSystem fileSystem = FileSystem.get(URI.create(uri), configuration);
FSDataInputStream fsDataInputStream = null;
FSDataOutputStream fsDataOutputStream = null;
fsDataInputStream = fileSystem.open(new Path(uri));
When we are trying to connect from outside the VPC Networks ( i.e executing jar from EC2 node outside the VPC network or Eclipse in local environment). We observe that unable to establish fileclient because CLDB node is redirecting to connect with privateIP of the node and outside the VPC privateIP is not resolvable by DNS or network routing. Eventhough we are trying to connect to CLDB node with publicIP or Hostname resolvable from client node internal workings are using privateIP for connection. As you can notice above even though we have tried to connect to publicIP of mapFS CLDB node internally it redirects to privateIP <172.31.17.208> for 7222 and 5660 ports which are in theory access outside but the client node (outside the VPC ) cannot resolve 172.31.17.208 to 184.108.40.206 the publicIP of the MapR AWS node.
As you can see when executed from similar node but inside the VPC same as the MapRCluster node, we are successfull in accessing maprFS filesystem. This works because inside the VPC of AWS the privateIP are resolvable and hence we are successfully able to connect to an MFS even if the client node is in different subset and security group as long as it can ping the private address
In an Apache Hadoop we can set the following property to direct the client nodes to resolve the hostnames instead of providing the IPaddress ( in this case Private IP Address in AWS) as describer here.
<description>Whether clients should use datanode hostnames when
connecting to datanodes.
<description>Whether datanodes should use datanode hostnames when
connecting to other datanodes for data transfer.
Is there a similar property in MapR CLDB configuration that lets client perform their own DNS resolution of hostname instead of providing IPaddress for connection. When installing MapR cluster in AWS we cannot provide public static address during installation as they change hence MapR installs nodes using their private address or hostnames.