AnsweredAssumed Answered

Improving MaprFS read-performance for hot files

Question asked by heathbar on Aug 9, 2012
Latest reply on Aug 9, 2012 by heathbar
I'm having a MaprFS performance tuning problem...

Problem: Each mapper loads the same 2GB file from the maprfs.  If I run just one mapper, it takes 23 seconds to load it.  If I run 50 mappers, it takes on average 40 minutes for all the mappers to load the same file.  I would like to understand and fix this bottleneck.

Here's what I've tried.
Attempt 1 - Create a "cache" volume with 6x replication to serve up the frequently read file.  I would have expected 6x speedup, but it still takes 30 minutes.
Attempt 2 - Manually increase the replication by duplicating the file 10x (also on the 6x replication volume) and have each mapper read a copy mod the task_id... but it still takes 30 minutes.

I am wondering if the bottleneck is in the NFS gateway server or the CLDB server.  What strategy would you suggest to understand the problem?

Outcomes