How to troubleshoot issues in MapR v3.1 with listing disks for a node in the MCS or using the 'maprcli disk list' API

Document created by jbubier Employee on Feb 13, 2016
Version 1Show Document
  • View in full screen mode

Author: Jonathan Bubier

 

Original Publication Date: August 13, 2014

 

The disk information for a node in a MapR cluster can be found using both the MCS under 'Node details' and using the 'maprcli disk list' command. 

 

In MapR v3.1 the mechanism used to obtain the disk information for either method has changed from earlier MapR versions.  In earlier MapR versions the webserver uses passwordless SSH to connect to the desired node and execute a script - /opt/mapr/server/disklist.sh to obtain the disk information.  This requires that all webserver nodes be able to SSH without a password to all nodes in the cluster as the MAPR_USER in order to provide the list of system disks and MapR disks. 

 

In MapR v3.1 the requirement for passwordless SSH to obtain disk information has been removed.  To get the disk information from a node the webserver now uses an RPC to communicate with the 'hoststats' process on the desired node.  This RPC instructs the 'hoststats' process to execute the /opt/mapr/server/disklist.sh script and report the output back to the webserver.  In some cases the command to obtain the disk information, whether through the MCS or using 'maprcli disk list' will fail with an error similar to the following:

 

ERROR (38) -  RPC to execute 'DISK_LIST' on node: hadoop-n1 returned no data.

 

Ex:

# maprcli disk list -host hadoop-n1

 

ERROR (38) -  RPC to execute 'DISK_LIST' on node: hadoop-n1 returned no data.

 

In order to resolve this error please use the following steps to determine the root cause and resolution.

1.  Verify the 'hoststats' process is running on the node and running with the correct arguments

 

As mentioned above, in MapR v3.1 the disk information is obtained by the webserver communicating with the 'hoststats' process on the desired node.  If the hoststats process is either not running or not running with the correct options this communication will fail and the above error will be seen.  To verify the running 'hoststats' process use the following:

 

$ ps -ef | grep hoststats

 

 

Ex:

$  ps -ef | grep hoststats

mapr     32048     1  0 Jun12 ?        00:52:09 /opt/mapr/server/hoststats 5660 /opt/mapr/logs/TaskTracker.stats -S 1

 

Note the '-S 1' option in the 'hoststats' arguments.  This option indicates that the 'hoststats' process can send and receive remote RPCs and is necessary for the disk list RPC to be processed successfully.  If 'hoststats' is running without this option verify the following options and their corresponding value in /opt/mapr/conf/warden.conf:

 

rpc.drop=false

hs.rpcon=true

hs.port=1111

hs.host=localhost


If these options are set as above it is possible that the 'hoststats' process was started manually and not by warden during normal startup.  To have warden start the process terminate the current running process and allow warden to restart it automatically.  Once the process is restarted verify it has the correct command line arguments.


2.  Verify network connectivity between the webserver and the cluster nodes

 

The webserver communicates with the 'hoststats' process on each node by sending an RPC to the node on TCP port 1111.  This is the port that 'hoststats' listens on by default. 

 

Ex:

 

$ netstat -anp | grep hoststats

 

tcp        0      0 0.0.0.0:1111                0.0.0.0:*                   LISTEN      18566/hoststats

 

Verify that the webserver node is able to connect to each cluster node on port 1111 using the 'telnet' utility.  From the console of the webserver node launch telnet to connect to the hostname and specify port 1111. 

 

Ex:

 

$ telnet hadoop-n1 1111

Trying 192.168.1.1...

Connected to hadoop-n1.

Escape character is '^]'.

 

The above indicates a successful connection and no network connectivity issue between the nodes.  If the 'telnet' utility is unable to establish a connection verify that the remote node is listening on port 1111 using 'netstat' as in step 2 above.  If the remote node is listening on TCP port 1111 verify there is no firewall or other network filtering occurring between the nodes.  In some cases this can be caused by the 'iptables' service running on one or both nodes and the 'iptables -L' command can be used to list the current network filtering rules.

 

3.  Verify each node's hostname resolves to an IP covered under MAPR_SUBNETS (v3.1.0 only)

If the cluster nodes are multi-homed and the MAPR_SUBNETS environment variable is used to restrict the interfaces to be used by MapR, confirm that the node's hostname resolves to an IP covered by MAPR_SUBNETS.  As an example, the node hadoop-n1 has two interfaces - 192.168.1.1/24 and 10.10.1.1/24, the hostname hadoop-n1 resolves to 192.168.1.1 and MAPR_SUBNETS is set to 192.168.1.1.  In this example if instead MAPR_SUBNETS were set to 10.10.1.1/24 on all nodes the disk list RPC send to 'hoststats' would fail with the same error seen above. 

 

This requirement is a known issue and is needed only in MapR v3.1 and is no longer needed in MapR v3.1.1 and later.  A reference to this issue (13636) can be found in the v3.1.1 release notes here: http://doc.mapr.com/display/RelNotes/Version+3.1.1+Release+Notes

Attachments

    Outcomes