This document describes step by step on how to deploy Mesos, Marathon, Docker and Spark on a MapR cluster and run various jobs as well as Docker containers using this deployment.
Short description on the components that we’re going to use:
Mesos: an open-source cluster manager.
Marathon: a cluster-wide init and control system.
Spark: an open source cluster computing framework.
Docker: automates the deployment of applications inside software containers.
MapR Converged Data platform: integrates Hadoop and Spark with real-time database capabilities, global event streaming, and scalable enterprise storage to power a new generation of big data applications.
This tutorial assumes you already have a MapR 5.1.0 cluster up and running. For testing purposes it can be installed on a single node environment. In this example however we will deploy Mesos on a 3 node MapR cluster, eg:
- Mesos Master: MAPRNODE01
- Mesos Slave: MAPRNODE02, MAPRNODE03
Lets get started!
# Make sure Java 8 is installed on all the nodes in the cluster
# If Java 8 is not yet installed, install it and validate
yum install -y java-1.8.0-openjdk
# Set JAVA_HOME to Java 8 on all the nodes
# If JAVA_HOME isn't pointing towards Java 8 fix it and test again
# Please make sure that: /usr/lib/jvm/java-1.8.0-* is matching your java 8 version
# Load and validate the newly set JAVA_HOME
Now you’re all set with the correct Java version. Let’s go ahead and install the Mesos repository to retreive the binaries from.
Install Mesos repository
Please make sure you install the correct Mesos repository matching your CentOS version.
# Validate your CentOS version
# for CentOS 6.x
rpm -Uvh http://repos.mesosphere.com/el/6/noarch/RPMS/mesosphere-el-repo-6-3.noarch.rpm
# for CentOS 7.x
rpm -Uvh http://repos.mesosphere.com/el/7/noarch/RPMS/mesosphere-el-repo-7-3.noarch.rpm
Now that we have the Mesos repositories installed it is time to start installing Mesos and Marathon.
Install Mesos and Marathon
# On the node(s) that will be running the Mesos Master (eg: MAPRNODE01):
yum install mapr-mesos-master mapr-mesos-marathon
# On the nodes that will be running the Mesos Slave (eg: MAPRNODE02, MAPRNODE03):
yum install mapr-mesos-slave
# Run on all nodes to make the MapR cluster aware about the new services
# Validate the Mesos Web UI to see the master and slave http://MAPRNODE01:5050
Launch a Mesos job from the shell
# Launch a simple Mesos job from the terminal by executing:
MASTER=$(mesos-resolve `cat /etc/mesos/zk`)
mesos-execute --master=$MASTER --name="cluster-test" --command="sleep 5"
Besides the console output, which will show a task being created and changing status to TASK_RUNNING and then TASK_FINISHED, you should also see a newly terminated framework on the frameworks page of the Mesos console UI: http://MAPRNODE01:5050
Launch a Mesos job using Marathon
Open Marathon by pointing your browser to http://MAPRNODE01:8080 and click on “Create Application”
# Create a simple app to echo out 'hello' to a file.
Disk space: 0
Command: echo "hello" >> /tmp/output.txt
# Click "Create Application"
Check the Marathon console (http://localhost:8080) to see the job being deployed and started:
# Check the job output to see "hello" being written constantly
tail -f /tmp/output.txt
Check the Active task in Mesos by pointing your browser to http://localhost:5050:
Finally, destroy the Application by opening Marathon console (http://MAPRNODE01:8080), click on the ‘cluster-marathon-test’ application and select ‘destroy’ from the config drop-down:
Launch Docker containers on Mesos
Now that we have Mesos running, it is easy to run Docker containers at scale. Simply install docker on all nodes running Mesos Slave and start launching those containers:
Install docker on all Mesos Slave nodes
# Download and install Docker on all Mesos Slave nodes
curl -fsSL https://get.docker.com/ | sh
# Start Docker
service docker start
chkconfig docker on
# Configure Mesos Slaves to allow docker containers
# On all mesos slaves, execute:
echo 'docker,mesos' > /etc/mesos-slave/containerizers
echo '5mins' > /etc/mesos-slave/executor_registration_timeout
# Restart the mesos-slave service on all nodes using the MapR MCS
Now that we have Docker installed we will be using Marathon to launch a simple Docker container being the Httpd webserver container for this example.
# Create a JSON file with the Docker image details to be launched on Mesos using Marathon
# Add the following to the json file:
# Submit the docker container using the created docker.json file to Marathon from the terminal
curl -X POST -H "Content-Type: application/json" http://MAPRNODE01:8080/v2/apps -d@/tmp/docker.json
Point your browser to open Marathon (http://localhost:8080) and locate the httpd Docker container:
Underneath the ID field, Marathon will expose a hyperlink to the Docker container (please note that the port will be different as this will be dynamically). Click on it and you will connect to the httpd container:
You've now successfully launched a Docker container on Mesos using Marathon. You can use the same approach to launch any kind of Docker container on the Mesos infrastructure. In Addition, you can use MapR's unique NFS capabilities to connect the Docker container to any data on the MapR Converged Data Platform, without any need to worry about on which physical node the Docker container will be launched. In addition, if you want to connect your Docker containers securely to MapR-FS it is highly recommended to use the MapR POSIX Client. My community post below describes how to achieve this:
With the ability to launch Docker containers on our Mesos cluster, lets move on and launch Spark Jobs on the same infrastructure.
Install and launch Spark jobs on Mesos
# Install Spark on the MapR node (or nodes) from which you want to submit jobs
yum install -y mapr-spark-2.1.0*
# Create the Spark Historyserver folder on the cluster
hadoop fs -mkdir /apps/spark
hadoop fs -chmod 777 /apps/spark
# Tell the cluster that new packages have been installed
# Download Spark 2.1.0 - Pre-built for Hadoop 2.7 and later
# Deploy Spark 2.1.0 on the MapR Filesystem so Mesos can reach it from every MapR node
hadoop fs -put spark-2.1.0-bin-hadoop2.7.tgz /
# Set Spark to use Mesos as the execution framework
# Set the following parameters, make sure the libmesos version matches your installed version of mesos
Launch a simple spark-shell command to test Spark on Mesos:
# Launch the Spark Shell job using Mesos as the execution framework
/opt/mapr/spark/spark-2.1.0/bin/spark-shell --master mesos://zk://MAPRNODE01:5181/mesos
# You should now see the spark shell as an active framework in the mesos UI
# Execute a simple Spark job using Mesos as the execution framework
val data = 1 to 100
Submit a Spark job to Mesos using spark-submit:
# Run a Spark Submit example to test Spark on Mesos and MapR
--name SparkPiTestApp \
--master mesos://MAPRNODE01:5050 \
--driver-memory 1G \
--executor-memory 2G \
--total-executor-cores 4 \
--class org.apache.spark.examples.SparkPi \
Troubleshooting the various components like Mesos, Marathon, Spark and Docker to find potential issues can be a bit challenging given the amount of components involved. Therefore find below a top 5 of troubleshooting tips:
# 1. Marathon port number 8080
This port number might conflict with the Spark Master as this runs on the same port.
# 2. Log information
The Mesos Master and Slave nodes write their log information into on the respective nodes:
# 3. Marathon as well as some generic Mesos Master and Slave logging ends up in /var/log/messages
tail -f /var/log/messages
# 4. Enable extra console logging by executing the following export prior to running spark-submit on Mesos
# 5. Failed to recover the log: IO error
This error message may occur if you previously ran Mesos as the root user and are
now trying to run it as non-root users (for example the mapr user).
# Full error message in /var/log/messages:
# Failed to recover the log: IO error /var/lib/mesos/replicated_log/LOCK: Permission denied
chown -R mapr:mapr /var/lib/mesos/replicated_log/
In this article you’ve learned how to deploy Mesos, Marathon, Docker and Spark on top of the MapR Converged Data Platform. You’ve also submitted various jobs using the shell, launched Spark jobs as well as Docker containers.
If you want to securely connect Docker containers to the MapR Converged Data Platform, please read my below community post on:
Connect Docker containers securely to MapR-FS using the MapR POSIX Client
Please like or comment on this article to provide any feedback or questions.