Configuring RPC timeouts in MapR FS

Document created by raghunadha Employee on Jun 29, 2017
Version 1Show Document
  • View in full screen mode

The below article talks about tuning RPC timeout which could impact all MapR core components. It is strongly recommended to contact MapR Support before doing any changes to default RPC timeout values.

'RPC Timeout' Tuning : 
MapR software components like MFS process, NFS Gateway Server,CLDB process,File Client uses RPC layer (not SUN RPC) to talk to each other.
The default RPC timeouts between different components in MapR cluster are as below, 

  • FileClient to MFS & CLDB : 99 seconds (configurable)
  • NFS Gateway to MFS (for File operations): 300 seconds (hard coded)
  • MFS Gateway to MFS (for replication) : 300 seconds (hard coded)


The RPC timeout for communication between NFS Gateway<-->MFS and MFS<-->MFS are hard coded, so not configurable.But the RPC timeout between FileClient to MFS/CLDB is configurable.
The subsequent portion of this article talks about configuring RPC timeout between FileClient and MFS/CLDB only.

The FileClient software is used to talk to MapR-FS in below 3 common scenarios

  1. To submit MapRReduce/YARN jobs. 
  2. To run Hbase Client operations (like hbase shell)
  3. The standalone JAVA applications built using JARs from Maven repository


The RPC timeout for FileClient can be specified in 3 following ways

1.Setting RPC timeout in JAVA appplication
     /* For Mapr-FS applications */
          Configuration conf = new Configuration();
          conf.set("fs.mapr.rpc.timeout", "600");

     /*For MapR-DB applications */
          Configuration conf = HBaseConfiguration.create();
          conf.set("fs.mapr.rpc.timeout", "600");


2.Setting RPC timeout in C applications
          int err = hdfsSetRpcTimeout(600);

3.Alternatively , core-site.xml can be used to set RPC timeout.
      <property> 
      <name>fs.mapr.rpc.timeout</name>
      <value>600</value> 
      </property> 


The above RPC timeout is meant to be the overall timeout for entire operation(including retries - after trying each replica).This RPC timeout is internally converted into a different value per rpc call as below.

  /*set internal RpcTimeout assuming average 3 CLDBs and 3 replicas*/
            internal timeout  = (Given RPC timeout - 2)/3;

So 300 second fs.mapr.rpc.timeout maps to 99 internal wire level rpc timeout per rpc call. If user wants an internal wire level rpc timeout to be 200 seconds , then user should specify the fs.mapr.rpc.timeout (or through conf.set())  as 602 seconds. Please note that this RPC timeout is not a per-RPC call, so change in default value affects all RPC operations.
 

Attachments

    Outcomes