FileSystem.globStatus performance issues

Question asked by chriscurtin on Apr 27, 2012
We're comparing an application currently running on Cloudera with MapR. Calls to the globStatus function on a FileSystem object are taking 5+ seconds to return in MapR while in Cloudera it is sub-second.

Everything else seems to work okay. Example of the path being Globbed:


The directory has 5300 files in it and the glob will match 6.

Using 'hadoop fs -ls ...' it takes 6-7 seconds to get a listing both with the * and just of the directory.

Any idea what is wrong? The main reason we are considering MapR is we hit NameNode limits on Cloudera with 11MM + files across 9000+ directories.