AnsweredAssumed Answered

FileSystem.globStatus performance issues

Question asked by chriscurtin on Apr 27, 2012
Latest reply on Apr 27, 2012 by chriscurtin
Hi,

We're comparing an application currently running on Cloudera with MapR. Calls to the globStatus function on a FileSystem object are taking 5+ seconds to return in MapR while in Cloudera it is sub-second.

Everything else seems to work okay. Example of the path being Globbed:

"/offlined/5_10016/2012_04/3998447_41956861_*"

The directory has 5300 files in it and the glob will match 6.

Using 'hadoop fs -ls ...' it takes 6-7 seconds to get a listing both with the * and just of the directory.

Any idea what is wrong? The main reason we are considering MapR is we hit NameNode limits on Cloudera with 11MM + files across 9000+ directories.

Thanks,

Chris

Outcomes