What is the "Time Skew Alarm?"

Document created by mufeed on Feb 7, 2016
Version 1Show Document
  • View in full screen mode

Author: Mufeed Usman

 

Original Publication Date: April 29 2015

 

Scenario:

maprcli alarm list command throws TIME_SKEW_ALARM.

 

Goal:

To clear this alarm and get the cluster into a healthy time synchronized state.

 

Solution:

It is highly recommended to keep the system clocks on cluster nodes synchronized. To aid administrators in achieving this goal, the MapR software monitors the time reported by all cluster nodes and generates alerts and raises alarms when the times on nodes are skewed by an excessive amount.

 

Every node in a MapR cluster sends a small heartbeat packet to the CLDB on a regular basis, generally once every second. This heartbeat contains the current time on each node, which is compared to the time set on the CLDB node. When these time difference between these exceeds 20 seconds, the time TIME_SKEW_ALARM will be raised, and in turn, an alert is sent out to all recipients subscribed to receive emails from the MapR cluster. As soon as the time skew falls below the threshold, the alarm is cleared.

 

IMPORTANT System time is used for critical ZooKeeper functionality. If the time skew alarm is raised for multiple nodes providing ZooKeeper services, care should be taken to adjust the time on just one ZooKeeper node at a time, pausing for multiple seconds before syncing the time on a subsequent node, repeating the process until all ZooKeeper nodes have been updated one at a time.

Attachments

    Outcomes