MapR has worked closely with Azure to develop marketplace offerings that enable users to conduct a proof of concept experience or production deployment with the MapR Converged Data Platform on Azure. These marketplace offerings, which are preloaded and preconfigured with the MapR software and the required supporting operating system, can be launched on the Azure Marketplace portal. By default, these offerings are based on the MapR Converged Community edition that allows free and unlimited production use without certain features enabled. If the user wants to use MapR Streams, MapR-DB, HA, etc., they need to purchase these licenses from MapR and enable it on the cluster. For more detail regarding various MapR editions, please visit this link: https://www.mapr.com/products/mapr-distribution-editions. This blog post will cover the details of how you can get a MapR cluster on Azure up and running in less than 30 minutes. Alternatively, for developers who want to have a low-cost MapR environment to play with, check out this blog post to spin up a MapR sandbox on Azure Marketplace. It is based on a MapR community license that gives you unlimited access to MapR-DB and MapR Streams as well as other DR and HA features, such as mirroring and snapshot, etc.
Working with Azure MapR marketplace offerings
Before you start, it is highly recommended that you check the usage and quota under your current Azure subscription. These quotas include CPU, Network, and Storage. Make absolutely certain that you have planned for enough resources before you start launching the cluster. To check your quota, you can go to the Azure portal, select the desired subscription where you want to launch the MapR cluster, then select “Usage + quotas." See the screenshot below:
Point your browser to https://portal.azure.com, and login to your Azure account. In the search area, click on the “+New” option on the left pane, type in MapR, and you will find a few MapR Azure Marketplace offerings, as illustrated below.
Select the desired offering (e.g. MapR Converged Data Platform v5.2).
After you have selected a desired offering, you will see an introduction screen, as shown below. Go ahead and click on “Create.”
In this screen, you need to provide information, such as cluster name, disk type for the cluster, username for the admin to manage the cluster, password, subscription, and resource group name (currently only new resource group option is supported). After you have filled in the information, select “OK” to go to the next step.
In this step you will need to determine the cluster size and VM size as well as the password for the unique “mapr” user. The “mapr” user is a reserved power user for the MapR cluster. When selecting the VM size, you can expand the “View all” on the top right corner of the screen to see more options. In this case, we selected “D3 Standard.” Click “Select” to move forward. You can visit this link to find out the pricing information for these various VM sizes: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux.
In this step, you are going to configure network settings for the cluster. You can select either new or existing networks. We chose to create a new network for our cluster in this demo: you fill in the name of the network as well as the address space and subnet address range, and then hit “OK.” Note that if there is any security group associated with the existing subnet, the cluster will inherit those security group settings. By default, there will be no security group setting created if you chose to create a new network for this deployment. Please visit this link to find out information about how to create a security group to protect the cluster after the cluster is deployed successfully: https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-nsg.
In this step, you have an opportunity to review your cluster settings. Click “OK” if everything looks good.
In this step, you need to review the purchase agreement. Click “Purchase” to proceed.
If everything goes well, you should see a successful deployment in about 30 minutes, depending on the cluster size and VM types. Click the template in the resource group on the portal to find out the various web addresses to connect to the newly deployed cluster.
Once you have copied the IP addresses of the web links, paste them to your browser’s navigation window. For example, you can go to the cluster web console for the MapR Control System (MCS) portal and login with the credentials that you provided when creating the cluster. Note that if you selected ssh public key as your authentication method earlier, you should login as the power user ‘mapr’ with its password you assigned. If you selected the “Password” option for authentication, you can either login as the sysadmin user or the power user ‘mapr’ with the passwords you assigned respectively.
MapR 5.2 also comes with MapR Monitoring installed; it is based on Kibana, Grafana, Elastic search, and OpenTSDB. It gives you real-time cluster performance visualization and the abilities to analyze the various cluster logs. Visit this link to learn more about MapR Monitoring: https://www.mapr.com/resources/mapr-monitoring.
You are also encouraged to try out Apache Drill (http://drill.apache.org), the SQL-on-Hadoop tool that is gaining a lot of momentum in the community. Simply point your browser at the URL link described above. Don’t forget to check out the Drill tutorials, where it walks you through the steps required to query data stored in a MapR cluster using Drill, and extract insights and visualize the findings using Tableau, a leading visualization tool that is part of the broader ecosystem.
For those who are interested in real-time processing, such as Apache Spark, MapR Streams, or NoSQL database like MapR-DB, the MapR Converged Data Platform comes with everything included; you just need to purchase respective licenses to enable these modules. There is no need to install extra layers of applications, like Kafka or HBase.
- MapR Azure Marketplace offerings by default use dynamic external IPs for the instances, because most users may not have access to static IPs. Therefore, the external IPs will change across instance reboots. That said, the cluster will continue to function across the boots because the internal IPs remain the same. However, if you plan to have external applications connecting to the cluster, you will either have to modify the client to point to the new IP addresses of the cluster nodes or switch to static IPs after you spin up the cluster the first time. To switch to static IPs, pick the instance, usually the first node, select “Network interfaces,” then click on the interface name as described in the screenshot below:
Further click on “Overview” -> “Public IP address” -> “Configuration” -> “Static,” then click “Save” to take effect. See the screenshot below:
2. You may want to shutdown the cluster from time to time to save costs. To do that, simply ssh into the first node as root, then issue the following commands to shutdown MapR services:
clush –a service mapr-warden stop
clush –a service mapr-zookeeper stop
Once MapR services are stopped, go to the Azure portal to stop your cluster nodes. Note: don’t simply issue a “init 0” or “halt” command in the instances, because these commands will not deallocate the used resources from instances, and you will still be billed for VM usages.
We have walked you through how to spin up a MapR cluster on Azure Marketplace. In order to take full advantage of the MapR cluster for your big data analytics, I encourage you to check out MapR community (http://community.mapr.com). You can find lots and lots of resources to help you understand more about MapR and its differentiators against other Hadoop distributions as well as demo blog posts on using MapR for data analytics. You can also ask questions, which will be answered very quickly by our community members.