RAID is a popular in RDBMS, used to create data replication for reliability.
The MapR platform automatically replicates and distributes data. We do not recommend setting RAID on the cluster disks, since that will incur unnecessary overhead and will impact cluster performance.
hadoop, hadoop question, clusters
The main use of RAID is to provide backup service in cases of any failures. However, HDFS is fault tolerant hence no need to use RAID for data nodes as data is stored in multiple copies across different data nodes. But this can be used for name nodes as it could fail.
You can but it would be a waste to configure RAID on cluster nodes. The primary reason is HDFS provides its own replication mechanism (remember replication factor).
It is worth mentioning here that around 30-40% of the disk space should be reserved for intermediate tasks e.g. MapReduce intermediate outputs and some other OS related intermediate activities.
For Data Nodes- The RAID is not required as the Hadoop ensures that it is Fault Tolerance and replicates the data in data locality awareness fashion.
For NameNodes- Name Nodes could be the point of failure as it reads the metadata in memory as well as routinely writes to disk as well. If Name nodes go down and disk crashes then we may be in serious trouble. In order to avoid this failure and as a risk mitigation, we can configure RAID on Name Nodes.
SummaryRAID is a good thing and can be leveraged on Name Nodes but it would be an overkill, slow and expensive to have it on Data Nodes.
Hi, gonna take a bit of a different tact...
The answer is yes and no...
When configuring a system, you will want to mirror your OS drives. This is for fault tolerance and better up time.
Depending on the drive type(s) and sizes, you may also want to add a local drive that is not part of your MapR-FS. This is the drive space where you will allow apps to spill/swap to local disk. As well you can run your MariaDB/MySQL on local disk or on MapR-FS. Because you're using ZK to synchronize your HiveServer2 DB servers, you may want to use local drives.
That said, for drives being used by MapR-FS, you do not need to configure raid. (Which everyone above is already pointing out.)
My point is that you need to consider your OS and local drives as part of your configuration so you do need to consider raid. (Just not RAID 5, either Raid 1 (mirroring) or Raid 10.
Just my $0.02 worth.
Retrieving data ...