jsun

Deploying the MapR Converged Data Platform on Azure Container Service with Kubernetes Orchestrator

Blog Post created by jsun Employee on Apr 10, 2017

Introduction

Big data developers and QA professionals need a robust big data platform, where they can concentrate their efforts on software development and code testing before rolling out to production. However, getting access to test and staging environments can be challenging, as these are often not self-enabled and require IT assistance. Because of this gap, time-to-market could be adversely affected and product life cycles become too long to adapt to today’s speed of business.

 

Fortunately, containerized service offers a solution to narrow this gap. In my previous post, I offered a way to spin up a mini containerized MapR cluster in a single virtual instance. That works fine for single user environments but is not scalable. What if you have a team of developers who want to collaborate on the very same containerized MapR cluster? A single virtual instance will not be able to satisfy the need.

 

Introducing Azure Container Service (ACS). It is an Azure service that makes it simpler to create, configure, and manage a cluster of virtual machines that are preconfigured to run containerized applications. It uses an optimized configuration of popular open-source scheduling and orchestration tools, like Kubernetes, DC/OS, and Docker Swarm. As a result, there's no need to change your existing management practices and tools to move container workloads to Azure; you can keep using the tools you are used to. It is possible to deploy a full-blown MapR cluster in less than an hour. There's no need to rely on the ever-busy IT professionals to assist you and no need to consume a large hardware environment.

 

MapR has been working closely with ACS to make the deployment much easier. In this blog post, I will walk you through the necessary steps to deploy the MapR Converged Data Platform on ACS and demonstrate the capabilities of the MapR Persistent Application Client Container (PACC) for deploying your containerized applications that leverage the MapR Platform as a persistence tier. Note that the described configuration is not supported by MapR and thus should not be used for production deployments; it should only be used for test, demo, training, or development environments.

 

Prerequisites

Before you start, please set up an account on Azure. You can sign up for one here. Additionally, install Docker on your computer or a cloud instance by following these instructions here. That’s it: now you are ready to start deploying!

 

Step 1 – Download a pre-built container, start it ,and login to Azure

On the computer or cloud instance where you installed Docker, run the following command:

 

docker run --name azm --hostname azm -it maprazure/azm:latest /bin/bash

 

Now you are in the azm container, and at the prompt, you need to login to your Azure account. This container already has Azure CLI 2.0 installed; you can find more information about it on the following documentation page: Get started with Azure CLI 2.0. Below, we have a quick summary of how to login to your Azure account:

 

[root@azm /]# az login

 

Follow the instruction to complete the login process.

Example:

To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code BT376Q5W8 to authenticate.

 

In a short moment, you should get the prompt back, after you login successfully.

 

Step 2 – Deploy a Kubernetes cluster

 

To deploy a Kubernetes cluster, you need to execute the “deploy-k8” command. You can provide various options. To view these options, specify the “-h” option for help menu.

[root@azmaster payload]# deploy-k8 –h

 

Usage: deploy-k8 [options]

 

Options:

--version             show program's version number and exit

-h, --help           show this help message and exit

-g GNAME, --resource-group=GNAME

                       Azure Resource Group Name. default: myacs

-d DNS_PREFIX, --dns-prefix=DNS_PREFIX

                       DNS Prefix for Kubernetes Hosts.

-l LOC, --location=LOC

                       Azure Region, e.g. westus, eastus, etc. default:

                       eastus

-a APPNAME, --app-name=APPNAME

                       Azure Application Name. default: mykubecluster

-p APPPASSWORD, --app-password=APPPASSWORD

                        Azure Application Password

-s VMSIZE, --vm-size=VMSIZE

                       VM size of the Kubernetes agents. default:

                       Standard_D2_v2

-c AGENTCOUNT, --agent-count=AGENTCOUNT

                       Number of the Kubernetes agents. default: 3

-q, --quiet           don't print status messages to stdout

 

There are default values for each option. At the minimum, you should specify the password and DNS prefix while executing the command to deploy a Kubernetes cluster; this password is a key required by the Kubernetes application to authenticate with Azure infrastructure.

 

 

For example:

[root@azm ~]# deploy-k8 -p M@prtest1 -d myk8

In about 10 minutes, you will get the shell prompt back. This means that the Kubernetes cluster is deployed. Now go to the Azure portal (http://portal.azure.com), select Resource Groups, and assuming you didn’t specify the resource group name, the default resource group myacs will be listed as below.

 

Select myacs and you will see it includes quite a few resources, including virtual machines, load balancers, storage, etc. By default, one Kubenetes master and 3 agents are created, and their VM sizes are Standard_D2_v2.

 

Step 3 - Login to Kubernetes master node and deploy MapR Converged Data Platform

 

On the azm container, issue “ssh-master” command:

[root@azm ~]# ssh-master

 

This will get you to login to Kubernetes master node; to check if the Kubernetes cluster is ready, issue this command at the prompt, for example:

root@k8s-master-C5E9779-0:~# kubectl get nodes

NAME                   STATUS                    AGE

k8s-agent-c5e9779-0   Ready                     5m

k8s-agent-c5e9779-1   Ready                     5m

k8s-agent-c5e9779-2   Ready                     5m

k8s-master-c5e9779-0   Ready,SchedulingDisabled   5m

 

You can see the output indicates that one master and 3 agents are ready. Now, you can move forward to deploy a MapR cluster by issuing the “deploy-mapr” command; again, -h option gives you the help menu:

 

root@k8s-master-C5E9779-0:~# deploy-mapr -h

Usage: deploy-mapr [options]

 

Options:

--version             show program's version number and exit

-h, --help           show this help message and exit

--maprv=MAPRV, --mapr-version=MAPRV

                       MapR version. default: 520

--mep=MEP             MEP version. default: 2.0

-c CLNAME, --cluster-name=CLNAME

                       MapR cluster name. default: mapr520

-n NNODES, --mapr-nodes=NNODES

                       MapR cluster size. default: 3

-a ADMIN, --admin-user=ADMIN

                       MapR admin username. default: mapruser

-p PASSWD, --admin-password=PASSWD

                       MapR admin user password

-s MODE, --security-mode=MODE

                       MapR security mode: base, sec or kdc. default: base

-d LDAPUSER, --ldap-user=LDAPUSER

                     MapR ldap username. default: ldapuser

-q, --quiet           don't print status messages to stdout

 

At the minimum, you should provide an admin password to manage MapR; for example:

root@k8s-master-C5E9779-0:~# deploy-mapr -p M@prtest1

 

This will kick off the MapR installation. By default, there will be 3 MapR containers deployed along with a LDAP container for user directory lookup, a Metastore container for Apache Hive, a MapR client container, a squid proxy container, and a cockpit container that is used to visualize and manage the Kubernetes cluster.

 

About halfway through, you will see messages like the following, which indicate that Kubernetes is configuring Azure load balancer, so you can access cockpit portal from the internet; for example:

 

Waiting for load balancer to open up cockpit port – 5 seconds, est. 5 min

Waiting for load balancer to open up cockpit port – 10 seconds, est. 5 min

……..

Waiting for load balancer to open up cockpit port - 185 seconds, est. 5 min

Waiting for load balancer to open up cockpit port - 190 seconds, est. 5 min

Please point your browser's at http://13.64.77.133:9090 for cockpit access.

 

Now point your browser at the URL. You should see the cockpit portal: login as root with the password you provided for MapR admin above.

 

 

Once you are logged in, you will see the console as below: click on the “Cluster” tab at the top. You will then further click on “Topology” on the left pane, and select the “mapr520” in the Project drop-down menu. See the graphs below:

 

 

Now you should be seeing the animation of your MapR cluster being deployed on Kubernetes. The black circles are the VMs in the Kubernetes cluster; the blue circles are the MapR containers that are being spun up.

 

Wait until the MapR deployment is finished. You should see something similar to the messages below on your Kubernetes master console:

<snip>

Waiting for load balancer to open up proxy port - 40 seconds, est. 5 min

Waiting for load balancer to open up proxy port - 45 seconds, est. 5 min

Waiting for load balancer to open up proxy port - 50 seconds, est. 5 min

Waiting for load balancer to open up proxy port - 55 seconds, est. 5 min

All Done!!

===============================================

Please point your browser's at http://13.64.77.136:9090 for cockpit access.

 

Please configure your browser's proxy setting to IP: 13.64.116.25 and Port: 30128

and then point your browser at https://mapr520-node0:8443 to access MCS

You can also point your browser at http://mapr520node0:8047 to access Apache Drill

===============================================

 

To get inside these containers, click on “Containers” in the left pane and highlight your desired container (e.g. mapr-client).

 

 

Once you are in the mapr-client container, you can issue certain commands such as:

  1. df: you will see a /posix mount point, which is the mount point that allows you to interact with MapR-FS using your POSIX compliant Linux commands.
  2. hadoop fs –ls: this is your all-too familiar command for showing the MapR-FS contents.
  3. id ldapuser: note that we have spun up a LDAP container for centralized username lookup; you should see the uid, gid of user ldapuser, and it is not in the local /etc/passwd.

 

 

Step 4 - Configure your browser to access MapR Control System (MCS)

We have also deployed a squid proxy container that allows you to access the MapR cluster. To do this, open up your browser’s proxy setting (Firefox browser is used as an example here, shown below), fill in the HTTP proxy field with the proxy setting IP in step 3 above, then type in port 30128. Click OK. Examples are shown in the following two graphs:

 

Now point your browser at MCS https://mapr520-node0:8443, and login as user mapr with the admin password you provided in step 3. You should be able to start managing the MapR cluster from the MCS portal. The cluster comes with a basic license that is sufficient to get you started; however, if you wish to explore the full features of the MapR Converged Data Platform, such as MapR-DB and MapR Streams, you will have to install a free 30-day unlimited trial license. You can do so by following the section under “Apply the Trial License” in my previous blog post.

 

 

Apache Drill is also available by pointing your browser at http://mapr520-node0:8047.

 

 

Step 5 – Deploy PACC services

Tugdual Grall wrote a great blog post regarding how to start using MapR PACC service. Basically, it uses a sensor PACC that collects its host’s performance stats (IO, memory, cpu load, etc.) and publishes them to a MapR Streams topic; a webserver PACC then consumes these stream messages and displays them with HTTP, so you can view them with a browser.

 

I have prepared a script in /opt/pacc on the Kubernetes master node. Execute it as follows:

 

root@k8s-master-238E7C1E-0# cd /opt/pacc

root@k8s-master-238E7C1E-0:/opt/pacc# bash deploy_pacc

 

deployment "sensor-deploy" created

deployment "websr-deploy" created

service "mapr-pacc-svc" created

Waiting for load balancer to open up cockpit port - 5 seconds, est. 5 min

Waiting for load balancer to open up cockpit port - 10 seconds, est. 5 min

Waiting for load balancer to open up cockpit port - 15 seconds, est. 5 min

…….

Waiting for load balancer to open up cockpit port - 90 seconds, est. 5 min

Waiting for load balancer to open up cockpit port - 95 seconds, est. 5 min

PACC deployment Done...

point your browser at http://40.83.251.214

 

In the browser, you can see these stats are refreshed every 3 seconds as they are published and consumed in real time.

 

 

Step 6 – Scale your PACC deployment according to demand

One very nice feature of Kubernetes is the capability to scale up and down your services dynamically, according to the demands, with no downtime required. In your cockpit window, one sensor container and two web server containers got spun up with the script in the previous step. You can see the web servers are spread across two VMs (black circles) for high availability purpose. Both of them are attached to a service (orange circle) to achieve balancing. The service is associated with a public internet IP address on the Azure load balancer to allow external access.

 

 

Now suppose the demand suddenly peaks, and you need more web server containers to serve the load. Issue the following command on the Kubernetes master node to scale the number of web servers from 2 to 8:

 

root@k8s-master-238E7C1E-0:/opt/pacc# kubectl --namespace=pacc scale deployment websr-deploy --replicas=8

 

In a few seconds, you should see 6 more web servers popping up in the cockpit window to handle higher load. Kubernetes also makes sure that these new containers are distributed across the hosts (black circles) as evenly as possible. Note that the other window, where it is displaying the stats, continues without any disruptions.

 

 

Lastly, in any case should you want to redeploy the cluster, you can issue this command to remove the existing cluster first, e.g. kubectl delete namespace mapr520, and then do “deploy-mapr –n <password>” to redeploy.

 

Summary

We have demonstrated how to spin up a MapR cluster on the Azure Container Service Platform and how to manage PACC with Kubernetes orchestration. The benefit for this solution is a faster time-to-market software development cycle. Software developers can confidently test their codes in this environment before releasing to production.

 

If you feel this article is of interest to you, please find out more about MapR at www.mapr.com or ask technical questions at our community forum, Answers

 

RELATED

MapR Installation

Products and Services

containers 

Outcomes