AnsweredAssumed Answered

AWS CLDB connection issue from Java API

Question asked by yabhinav on May 31, 2018
Latest reply on Jun 1, 2018 by yabhinav

We have a single node non-secure MapR cluster that we are trying to access through HDFS API using the following code snippet.

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

String uri = "hdfs://xx.xx.xx.xx:7222/tmp/gbDir/gbFile.txt";
Path path = new Path(uri);
Configuration configuration = new Configuration();
configuration.set("dfs.client.use.datanode.hostname", "true");
FileSystem fileSystem = FileSystem.get(URI.create(uri), configuration);
FSDataInputStream fsDataInputStream = null;
FSDataOutputStream fsDataOutputStream = null;
fsDataInputStream = Path(uri));


When we are trying to connect from outside the VPC Networks ( i.e executing jar from EC2 node outside the VPC network or Eclipse in local environment). We observe that unable to establish fileclient because CLDB node is redirecting to connect with privateIP of the node and outside the VPC privateIP is not resolvable by DNS or network routing. Eventhough we are trying to connect to CLDB node with publicIP or Hostname resolvable from client node internal workings are using privateIP for connection. As you can notice above even though we have tried to connect to publicIP of mapFS CLDB node internally it redirects to privateIP <> for 7222 and 5660 ports which are in theory access outside but the client node (outside the VPC ) cannot resolve to the publicIP of the MapR AWS node.





As you can see when executed from similar node but inside the VPC same as the MapRCluster node, we are successfull in accessing maprFS filesystem. This works because inside the VPC of AWS the privateIP are resolvable and hence we are successfully able to connect to an MFS even if the client node is in different subset and security group as long as it can ping the private address



In an Apache  Hadoop we can set the following property to direct the client nodes to resolve the hostnames instead of providing the IPaddress ( in this case Private IP Address in AWS) as describer here.

  <description>Whether clients should use datanode hostnames when
    connecting to datanodes.

  <description>Whether datanodes should use datanode hostnames when
    connecting to other datanodes for data transfer.


Is there a similar property in MapR CLDB configuration that lets client perform their own DNS resolution of hostname instead of providing IPaddress for connection. When installing MapR cluster in AWS we cannot provide public static address during installation as they change hence MapR installs nodes using their private address or hostnames.