AnsweredAssumed Answered

MapR with JBOD Vs Raid on good servers

Question asked by jid1 on Dec 18, 2014
Latest reply on Dec 18, 2014 by jid1
I've been trying to understand why is RAID discouraged when using MapR. From what I've read so far it boils down to the following:

 * MapR reconstruction is faster that RAID, mainly because of the unused space copying
 * MapR provides greater flexibility on HDD allocation by using storage pools
 * MapR does not suffer from the 'slow disk' problem (i.e. speed = speed of slower disk) and hence can also support non-homogeneous HDDs
 * MapR provides more fine grained unit of failure (by failing/degrading a Storage Pool)
 * MapR is faster that software Raid
 * MapR can provide a larger cache than HW Raid as it uses RAM

PS. *I've also read that MapR provides better locality. I am not sure I understand how this is achieved? What is the difference of storing 3 copies on Raid(Primary, on rack, off rack) Vs 3 on MapR? Was this meant for RAID without MapR?*

Have I misunderstood any of the above or missed something?

So, getting to the point. Assume the following Spec/Requirements:

 * My cluster consists of homogeneous high-performance servers (CiscoM4)
 * My Raid controller provides 6Gbs throughput and 200GB flash memory
 * I have 22 HDDs / Server
 * My app is read latency critical and heavily memory and CPU oriented (Spark)
 * I have a mixture of normal (20GB files) and small (1GB) files

So trying to compare the two options, intuitively I can infer that RAID(5) will be a much more efficient solution because:

 * Less CPU required when reading/writing
 * Small files throughput will be much faster because I'll be reading from all disk Vs 4 MapR blocks
 * Large files throughput will be much faster because I'll be reading from all disk Vs a 3-4 disk pool
 * Higher available memory & higher Raid memory as I wont be using any RAM
 * In case of failure, restoration will be slower, but I will still be using my HDDs-1 in a Raid5 setup and also automatic as I can have an spare. (Can also have two separate Raid arrays)
 * Raid5 will take a small performance hit on writes due to parity calcs, but I think that the Raid controller is beefed enough to cope with it. Also typically, write velocity is not as critical as read (i.e. if we can't deal with write velocity, this means that we probably run out of disk very quickly)

Based on the above, I do not see any benefits by using MapR with JBOD. Am I missing some low level details?