9 Steps to Deploying the MapR Converged Data Platform on AWS
by James Sun
If you’ve been keeping tabs on all the great product enhancements that have been coming out of MapR, you will know that the 5.2 version of the MapR Converged Data Platform went GA this summer. It takes a few cycles to make the platform available on the AWS marketplace, largely due to the testing efforts required. We’re pleased to announce that version 5.2 can now be deployed on the AWS marketplace.
MapR has worked closely with AWS to develop AMIs (Amazon Machine Images) that enable hourly usage of the MapR Converged Data Platform on AWS. These AMIs, which are pre-loaded and preconfigured with the MapR software and the required supporting operating system, can be launched using Cloud Formation Templates. Cloud Formation Templates automate the provisioning of the resources required to form a MapR cluster, and ensure that the MapR software is installed properly. This blog post will cover the details of how you can get a MapR cluster up and running in less than 20 minutes.
There are a few options available for you to install the MapR Platform:
- Through usage of scripts
- By using the installer
- Cloud-based installation
MapR has partnered with the AWS marketplace to make the install extremely easy. This will be very beneficial to both new customers and the many MapR partners and customers who already have AWS accounts.
Working with AWS MapR Marketplace Offerings
Point your browser to https://aws.amazon.com/marketplace. In the search area, type in MapR and you will find a few MapR AWS marketplace offerings as illustrated below.
Select the desired offering.
After you have selected a desired offering, you will see a screen as illustrated below. Go ahead and select your region and how many nodes you want in the MapR cluster. You can either select a single AMI or multi-node cluster deployment. You can review the estimated price associated with your selections in this step.
Select “Continue” to the next step.
In this step, you will confirm the launch options such as the MapR version, single node or cluster deployment, AWS region, as well as the terms for pricing.
Once you are comfortable with your selections, click “Launch with CloudFormation Console” to move forward.
In this step, you are going to use a CloudFormation template to launch the cluster. You can choose between a custom template or use an AWS tested template (default) to launch. If you are not sure about what to do, just leave everything as default.
Click “Next” to move forward.
In this step, you can specify the valuables pertaining to your cluster such as number of nodes, cluster name, etc. You can leave everything as default except the KeyName, PersistentStorage, RemoteAccessCIDR and VpcSubnetId. Type in your desired Keyname if you already have a keypair for EC2. If not, you should generate one for later to ssh into the cluster nodes as user “ec2-user” passwordlessly.
In the PersistentStorage field, you can specify the size (in GB) for each of the four EBS drives per instance. You can restrict the inbound traffic originated from the IP constrained by RemoteAccessCIDR. If you are not sure what to do, type in 0.0.0.0/0 in this field.
Finally, in the VpcSubnetId field, you can specify the subnet net you have with your EC2 account. If you don’t have one, go to https://console.aws.amazon.com/vpc and select “Subnets” on the left pane to create one.
Click “Next” to proceed.
In this step, you can tag your resources in the stack. If you are not sure what to do, don’t type anything here and click “Next” to continue.
Note that while highly unlikely, the deployment may sometimes fail due to network/resource contention on the cloud environment. It is a good idea to turn off roll back of the deployment so further troubleshooting can be done by logging into the instances.
In this step, you have a final chance to review all the variables before launching the cluster. Carefully review them and acknowledge the IAM capabilities at the bottom, then click “Create.”
After the CloudFormation launch, you will be able to see the progress of the cluster creation. Click on the “Events” tab to view the status.
After the installation has completed successfully, click on the “Outputs” tab. You will see login credentials and a URL to access the MCS console (highlighted with a redbox in the below screenshot). Use those credentials to login to MCS to start the cluster administration.
- MapR AWS Marketplace offerings by default use dynamic external IPs for the instances because most users do not have access to Elastic IPs, which are static IP addresses. Therefore, the external IPs will change across the boots. That said, the cluster will continue to function across the boots because the internal IPs remain the same. However, if you plan to have external applications connecting to the cluster, you will either have to modify the client to point to the new IP addresses of the cluster nodes, or switch to Elastic IPs after you spin up the cluster the first time. Please refer to this link for more information on how to work with Elastic IPs: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html
- Some users may want to shut down the cluster to save costs. This would require modification to the “Auto-scaling” policy. By default, the “Auto-scaling” feature with AWS in EC2 will automatically spin up the down nodes to meet the minimum number of nodes policy. Because your intention is to shutdown the whole cluster, you will need to modify the policy to allow all nodes to be shutdown. To do that, navigate to “EC2” on the AWS dashboard portal as described below.
Then click “Auto Scaling Groups” located on the left pane, then click “Edit”
Now in the “Suspended Processes” menu, add “Launch” and “Terminate,” then click “Save”
Now you can log in to every cluster node and issue “service mapr-warden stop” and “service mapr-zookeeper stop” commands to shut down MapR. After that, you can safely shut down your instances.
We have walked you through how to spin up a MapR cluster on AWS Marketplace. In order to take full advantage of the MapR cluster for your big data analytics, we recommend that you try Apache Drill, a schema-less open source SQL engine that lets you perform interactive SQL queries on any type of data.
Check out the test drive that includes Apache Drill along with the MapR Converged Data Platform. The test drive includes a tutorial that walks you through the steps required to query data stored in a MapR cluster using Drill, and extract insights and visualize the findings using Tableau, a leading visualization tool that is part of the broader ecosystem.
Content Originally posted in MapR Converge Blog post, visit here
Liked this content? Click like or leave a comment below