Cisco and MapR have been longtime partners on the big data journey. The recently published MapR CVD further strengthens the integration of our products with UCS’s superb management capabilities, complemented by the award-winning, enterprise-grade big data platform, the MapR Distribution including Apache Hadoop.
With the advent of container technology like Docker and application resource management platforms such as Apache Mesos, enterprise customers are looking at these technologies very seriously, as they promise much shorter development cycles and highly scalable product deployment.
A common use case for Mesos deployments is scaling Apache web server services dynamically. Normally, without a utility-grade persistent storage such as one backed by MapR-FS, the storage allocated to Docker containers is ephemeral and lost if a container crashes or is killed. With UCS and MapR, the web content is consistent among the Docker containers. Logs are persisted to MapR-FS and later analyzed with Hadoop.
Dockerizing the Web Server
With the Cisco UCS servers at the foundation, we effortlessly spun up a 10-node MapR cluster with Mesos installed. Using Docker, we created a web server container and then launched the container with Marathon – a Mesos framework for initiating/scaling long-running applications. We simply typed in an arbitrary ID and the following string in the command section of the New Application form:
“docker run -d –v /mapr:/www my/webserver” (my/webserver is the Docker container name), and off it went. The web server spun up almost instantaneously. We used the “Scale” button to quickly spin up 5 more web containers in less than a few seconds. See the figure below:
Sharing Persistent Storage among Containers with MapR-FS
A container has its own storage that is limited in space and cannot be shared with other containers. The MapR POSIX- compliant NFS gateway is a perfect solution that allows the containers to tap into the robust, HA/DR-ready MapR-FS for big data analytics. Note that we already have NFS-mounted MapR-FS on the cluster nodes under /mapr. When we spun up the container, the –v option allows the /mapr mount point on the host node to be mapped to a /www mount point in the container. Furthermore, we modified the DocumentRoot directive in httpd.conf to point to /www. This makes managing the web content much easier with real-time synchronization across all the web containers. Additionally, we modified the CustomLog and ErrorLog directives, pointing to a log directory under /www, where each container has its own set of log files associated with a unique hostid. With the MapR NFS gateway, we can simply verify these log files by typing the Unix ls command against the NFS mount point:
# ls /mapr/<MapR cluster name>/www/logs
2c4b95924357_access.log 64cc248c438b_error.log 869fff95a17a_access.log 871e141fc8e7_error.log
2c4b95924357_error.log 6b33974a3848_access.log 869fff95a17a_error.log 9631b85d9dd2_access.log
64cc248c438b_access.log 6b33974a3848_error.log 871e141fc8e7_access.log 9631b85d9dd2_error.log
This setup ensures that we have a central log repository protected by MapR-FS with scheduled snapshots and mirrored volumes that can be processed later with SQL-on-Hadoop tools like Apache Drill to perform web click stream analysis. Of course, this only serves to demonstrate the combined power and capabilities of MapR-FS, Docker and Mesos. The sky is the limit when it comes to other big data applications.
Project Myriad and Beyond
As you know, Yarn manages Hadoop cluster resources, and Mesos manages cluster resources for applications. Unfortunately, they do not communicate with each other, although each does a fairly good job in its own realm. Project Myriad was created to break down the wall between Mesos and Yarn. We believe that with Cisco UCS as the hardware foundation that delivers rock-solid performance with top-quality compute/network/storage resources, the MapR Distribution with Myriad enables the aggregation of the pools of resources for Yarn and Mesos. This combo holds great promise in solving most operation/development challenges by achieving much better resource allocation while retaining agility at scale.