soroka21

Monitoring Spark Worker with MapR Warden

Discussion created by soroka21 on Nov 30, 2016

When Apache Spark is installed in standalone mode MapR provides Warden configuration allowing to actively monitor Spark Master process. Warden also can restart it in case of failure. Unfortunately, by default, Spark Workers are not supported that way, meaning there is no process which will support Spark cluster in working condition.

We have created simple warden configuration file allowing to fill this gap.

To setup process monitoring in warden we have to provide 3 commands to support Spark Worker:

  • start
  • stop
  • status

Stop and Status commands are available from the spark distribution an can be executed on the local node where worker is running. To stop the process we have to execute following:

/opt/mapr/spark/spark-<spark version>/sbin/stop-slave.sh

To check status of local Spark Worker following command can be used:

/opt/mapr/spark/spark-<spark version>/sbin/spark-daemon.sh status org.apache.spark.deploy.worker.Worker 1

Starting Spark Worker is a little bit trickier since worker should know IP address of the Spark Master node. Spark master node can be identified by running maprcli command and parsing output. Below command line can be used inside the shell script to identify IP address of Spark Master node:

SPARK_MASTER_IP=`maprcli node list -columns hostname,svc | grep spark-master | awk '{print $1}'`

The whole script implementing the command which starts the Spark Worker can be written as following:

#/bin/bash

SPARK_MASTER_IP=`maprcli node list -columns hostname,svc | grep spark-master | awk '{print $1}'`
SPARK_MASTER="spark://$SPARK_MASTER_IP:7077"

SPARK_HOME=/opt/mapr/spark/spark-<spark version>

$SPARK_HOME/sbin/start-slave.sh $SPARK_MASTER

Now we can create warden configuration file and put it into /opt/mapr/conf/conf.d directory with warden.SparkWorker.conf  name on each node where Spark Worker is configured. Sample file can look like this:

# Name which will be displayed in MapR Web UI
service.displayname=SparkWorker
# We are giving spar 8GB of memory
service.heapsize.min=8000
service.heapsize.max=8000
service.logs.location=/opt/mapr/spark/spark-<spark version>/logs
#
# Stop command
service.command.stop=/opt/mapr/spark/spark-<spark version>/sbin/stop-slave.sh
#
# Start command
# Our custom shell script staring the Spark Worker - we put it into specific directory
#
service.command.start=/opt/mapr/spark/spark-<spark version>/warden/start-worker.sh
#
# Monitor command
service.command.monitorcommand=/opt/mapr/spark/spark-<spark version>/sbin/spark-daemon.sh status org.apache.spark.deploy.worker.Worker 1

 

Once warden configuration is reloaded with /opt/mapr/server/configure.sh -R command we should be able to see Spark Worker and Warden will take care about restarting it in case of failure:

Outcomes