I'd like to install a MapR Cluster that has 2 node type, master node and data nodes. Is it possible to have MapR Nodes with having any MapR-FS disks?
The issue of cluster configuration is a very important topic and its not very well documented.
For the purpose of discussion, the cluster design in question is for production. You should start with this model and then modify it based on your constraints if you're planning to create a test or development (dev) cluster.
With respect to Apache Hadoop distributions, you need to think of your cluster with three distinct types of nodes: Control Nodes, Data Nodes and Edge Nodes.
In production you will usually see at a minimum of three to five Control Nodes and then five or more data nodes usually growing to hundreds of nodes. (Outside of Federation, I believe you can use ZK to sync a copy of the Name Node where originally the Secondary Name Node was more of a logging node. )
With respect to MapR, the same model can exist, however, because MapR uses a CLDB concept which removes the single point of failure of the Name Node, you will run the CLDB across the control nodes as well as instances of ZK and HMaster and the mySQL instance for Hive, etc ...
While you can build out different configurations for the different types of nodes, you may end up with one single configuration of hardware, however, w.r.t Apache Hadoop, you would need to raid the drives (RAID 10) for redundancy, while the drives are JBOD for the DNs. W.r.t MapR, you can just use the same configuration and layout due to the CLDB runs on a separate MapRFS from the rest of the cluster. And of course you would want to place these machines across the racks just to reduce your risk of rack failure.
With Respect to your configuration:
Since you're running Community Edition, you lose some of the HA features that you would find in the supported version so you can run a single machine as your control node. Note that you will also be running a single instance of ZK as your quorum. (It has to be an odd number) You can then run your second machine as a data node only.
(See Deborah's response above)
Personally, I would run either a single machine as your cluster, or you can bump it up to three machines, however I'd recommend 4 machines. (1 CN, 3 DN) Note: Since you're running Community Ed, you don't have any HA so you should be prepared to lose the cluster in the event of a massive failure and you can't take advantage of the distributed features. The reason I would also do this is that you now have more machines dedicated to running jobs / services and there is less competition for resources. You can run two machines in your cluster, however it doesn't really work well nor is it efficient.
Of course YMMV and I'll be the first to admit I'm fairly conservative when it comes to cluster design.
Hi Juler John Sepnio,
Have you checked out Planning the Cluster - MapR 5.0 Documentation - doc.mapr.com ? Please share your ultimate goal for this cluster. It would help us better to guide you.
Yes I did check out the link about "Planning the Cluster". My ultimate goal is to create a MapR Cluster with HDP Ecosystem and separate the roles of my cluster nodes to Master and Worker Roles.
Master Roles are nodes that would have the master services like HBase Master, YARN Resource Manager, etc.
Worker Roles is where my HBase Regionservers, YARN Nodemanagers and MapR File Servers.
What I want is that I don't want to dedicate Disks to serve as MapR-FS disk to Master Nodes. I cannot achieve my goal using the MapR Installer.
Thank you for additional details. Want to invite Deborah, our cluster admin expert, to join the discussion and offer her recommendation. Thank you.
Some services (like ZooKeeper) can run without storage in the cluster, but they still need some disk space in the local file system. I don't know off the top of my head which others can run without MapR-FS installed. You are correct that the MapR Installer will not work for what you are trying to do: it assumes that all nodes you are installing are using the same disk names for cluster storage.
To install nodes with different disks, you would need to do a manual install, and then run disk-setup on the nodes separately. Here is some information on performing a manual install:
Installing without the MapR Installer
Retrieving data ...