AnsweredAssumed Answered

Recommended settings for distributed filesystem

Question asked by cristofer on Jul 22, 2013
Latest reply on Jul 22, 2013 by cristofer
Hi there! As many others, I'm evaluating MapR to handle a lot of small files in both transient and persistent ways. We reached some filesystem limitations and while we are evaluating some architectural improvements we want to quickly move from our NFS+SAN storage to a scalable replacement, and MapR seems to fit very well.

After following MapR's online documentation I built a small cluster with 4 nodes, all with 3 disk partitions for FS:

 - 1 fs + zk
 - 1 fs + zk + cldb
 - 1 fs + zk + ws
 - 1 fs + nfs

And I liked the results, despite some CLDB crashes during NFS benchmarks. I created two different volumes for temporary and persistent files with 2 replicas and I turned compression off for the directories mounted in this two volumes.

How much memory should I provide for these processes? It will be better to put CLDB separate from FS? Or under heavy load is better to have 2 CLDBs?

Now I'm looking for other applicable improvements but still have not found anything that looks like my scenario. Most of what we store fits in 2KB-64KB range, and temporary files are likely to be smaller than persistent files, having a similar # of files in both categories. If FS have some kind of cache functionality, then our use cases that deals with this files will be benefited. Persistent files also can be benefited but, as they age, we will move them to other volumes with compression enabled to save some space.

And how FS deals with files written by NFS? There will be 1 chunk of each file in the same node that NFS runs because of locality? Or NFS do some kind of distribution over FS to keep them balanced?

Thanks a lot for your help!

Best regards,
Cristofer

Outcomes