AnsweredAssumed Answered

Bench-marking process constraints for hadoop cluster

Question asked by prachi on Oct 10, 2012
Latest reply on Oct 10, 2012 by prachi
We have organized 9 node Seismic Hadoop cluster. We processed a 4gb SegY file and the time taken for processing for different nodes was :

 No. of Node   Time Required

    9 12mins, 44sec
    7 16mins, 49sec
    6 17mins, 15sec
    4 21mins, 2sec
    3 7mins, 58sec**

To be frank, the 'benchmarking' in our case has been just the difference in the processing time.

 - We are unaware as to what other
   parameters shall be considered and
   monitored - the I/O overhead, the space that must given to the HDFS and to be reserved on the local file system, the 'optimal' LAN speed etc.
 - Once a list of benchmarking parameters is established, the tools to be used to monitor and note the same. Ex. Apache Ambari, Cloudera Manager etc. are a few names but we don't want to dive in till we aren't clear about what exactly are we looking for.

*End of the day, we need to establish 'predictability' of a cluster - how much time and machines will be required, given a file size and level of computing complexity.* Our current SegY file processing exercise is first step towards gaining such insight into the strength and weaknesses of the Hadoop clusters.

Any concrete pointers are welcome !

Thanks and Regards.