I have a 3 Node Cluster, it was showing its health was degraded and the services are not starting. Please find the attached screenshot below.
Hi prudhvi theddu,
Have you checked Mathieu's suggestion under https://community.mapr.com/message/58132-volume-low-data-replication-alarms ? Also it would be helpful to know more about your environment such as MapR version and configuration.
Hi prudhvi theddu
I agree with Cathy Liu, we would need more information about your environment.
I noticed you have "NodeManager down alarm" on 2 of 3 nodes, please check corresponding Node Manager logs to retrieve more information about this failure. Another hint - your NMs might go into UNHEALTHY state due to low free space on the local file system. The property "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage", its equal to 90% by default. So if you have less then 10% free space left on your local file system, such NMs will fall off the cluster.
Another observation - you have "Node Alarm Heartbeat Processing Slow" raised on all 3 nodes. 2 nodes might be affected by this because NM services are totally down. But the only active NM should be able to sent heartbeats to Resource Manager in time. Please check your network for any issues or if you simply can ping RM node from this NM node. Also please check /etc/hosts file to ensure you have correct configuration inside.
But first, as I mentioned, check for NM logs on every machine to know what issues you faced.
Are you able to resolve the issue on your own? If you are still having an issue, please provide requested info for us to help you.
Retrieving data ...