AnsweredAssumed Answered

Hadoop using lots of I/O doing du -sk.  How to fix?

Question asked by communityadmin on Jun 22, 2011
Latest reply on Jun 23, 2011 by Ted Dunning
I have a stock 0.20.2 cluster.  Our nodes with 2 TB disks waste tons of disk io doing a 'du -sk' of each data directory. How is this going to work with 4TB 8TB disks and up ? It seems like calculating used and free disk space could be done a better way.

This is especially bad when I have more than a few hundred thousand blocks on a datanode.  I also note that Hadoop issues the du -sk command for all of the disks at the same time.  This blows away the inode cache and causes all kinds of problems.

Outcomes