Author: Sumesh Kurup
Original Publication Date: December 1, 2014
Spark Standalone cluster is a Master-Slave architecture. Having a single standalone master creates a single point of failure. In order to remedy this single point of failure, we will be using one of the available HA options which is based on Zookeeper’s Standby Masters (Standby Masters with ZooKeeper).
Zookeeper provides us the mechanism for leader election where you can configure multiple masters in the cluster for HA purposes which will be connected to the same Zookeeper instance. One master instance will take the role of a master and others would be in the standby mode. If the current master dies, Zookeeper will elect another standby instance as a Master, recover the older Master's state and then resume the scheduling . The process will usually take around 1-2 minutes .Since the information has been persisted on the filesystem, including Worker, Driver and Application information, this only impacts the scheduling of the new jobs without impacting any current running jobs.
First we need to have an established Zookeeper cluster. Start the Spark Master on multiple nodes and ensure that these nodes have the same Zookeeper configuration for ZooKeeper URL and directory. The master can be added or removed at any time. If a failover of the active master occurs, the newly elected Master will contact all the previously registered Applications and Workers to inform them that a new master has been elected and the registered applications and Workers gets registered with the newly elected Master.
In order to enable this recovery mode, set SPARK_DAEMON_JAVA_OPTS in /opt/mapr/spark/spark-<version>/conf/spark-env.sh using this configuration:
Set to ZOOKEEPER to enable standby Master recovery mode (default: NONE).
The ZooKeeper cluster url (e.g., n1a:5181,n2a:5181,n3a:5181).
The directory in ZooKeeper to store recovery state (default: /spark). This can be optional
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=n1a:5181,n2a:5181,n3a:5181
In our test environment we have configured two masters - n1a is the current master and n2a is the standby master.
- Ensure that the zookeeper cluster is up and running . This can be verified by running the following command on all Zookeeper nodes to confirm all nodes are active in the quorum:
$ service mapr-zookeeper qstatus
2. Start the Spark Master services on all the master nodes as follows if not started already by Warden:
$ maprcli node services -name spark-master -action start -nodes `hostname -f`
3. Once the services are started the Spark Master is elected via Zookeeper . In our setup the host "n1a" was elected as Master .
4. Proceed to start the Spark workers. The workers can be started by running the following on the active master:
10.10.70.181: starting org.apache.spark.deploy.worker.Worker, logging to /opt/mapr/spark/spark-1.1.0/logs/spark-mapr-org.apache.spark.deploy.worker.Worker-1-n4a.mycluster2.com.out
10.10.70.180: starting org.apache.spark.deploy.worker.Worker, logging to /opt/mapr/spark/spark-1.1.0/logs/spark-mapr-org.apache.spark.deploy.worker.Worker-1-n3a.mycluster2.com.out
The master and standby configuration in the test setup looks as below :
Current Master on n1a
Current Standby Master on n2a
5. To verify the failover works as expected shutdown the active master by running the following command on the master node:
$ maprcli node services -name spark-master -action stop -nodes `hostname -f`
6. The standby master will become the active master and after a couple of minutes the workers will be re-registered to the new master. Ex:
Current Active Master on n2a