What is the best way to find the space utilization on the nfs mount since the du -sh /nfsmount/* will run forever or it hangs .
Please let me know if you have any other alternatives to find the Space Utilization.
Probably the best thing on MapR would be to use volumes to organize your data, and then query the volume for its size. You can create volumes pretty liberally, as we support many thousands of them.
$ maprcli volume info -name benchmarks -columns volumename,mountdir,logicalUsed,used
mountdir used logicalUsed volumename
/benchmarks 2597 20503 benchmarks
In the output above, used is the space consumed on disk (after compression), and logicalUsed is the data size (before compression). Units are MiB.
# hadoop fs -df -h /tmp
Filesystem Size Used Available Use%
maprfs:/// 369.0 G 211.9 G 157.1 G 57%
Thanks Vince and Hao. So what are the options to calculate the space utilization per volume with replication included (total size with replication), just like an hierachical view of space usage by dir on the mount.
To check the space utilization for each volume, why don't check MCS UI page?
I believe that shows you the nicer output than command output.
maprcli volume info will show you the physical and logical space utilization exclusive of replication, and before/after compression:
$ maprcli volume info -name home.ec2-user -json | grep -i used "logicalUsed":"10015", "used":"1269", "snapshotused":"0", "totalused":"1269",
$ maprcli volume info -name home.ec2-user -json | grep -i used
logicalUsed is the actual size of your data before compression, and totalused is the size of your data after compression.
Replication is not included in the output. Refer to Mufeed Usman's link in another comment for information on that, but you'd simply multiply the totalused by the replication factor to get the space inclusive of replication. Obviously this does not take into account the possibility that your data is under-replicated at the moment that you run these commands.
$ maprcli volume info -name home.ec2-user -json | jq '(.data["numreplicas"]|tonumber) * (.data["totalused"]|tonumber)'3807
$ maprcli volume info -name home.ec2-user -json | jq '(.data["numreplicas"]|tonumber) * (.data["totalused"]|tonumber)'
I think MCS displays Total Volume Size without replication factor.
I haven't tried but looks like a good tool can be used to look up graphically:
This is mentioned at a thread nfs - Determining disk space usage & file counts for extremely large data sets (any tricks?) - Server Fault
bgajjela Not exactly an answer to your query, but I thought I'd share the following https://community.mapr.com/message/40153#comment-40153 for your reference.
Thanks for the update, i assume ncdu has to read the directory level untill it calculate the size which will run forever for large NFS mount.
bgajjela Please let us know if the concerns raised in this thread stands resolved.
Retrieving data ...