AnsweredAssumed Answered

chunck allocation strategy with compressed files

Question asked by phubert on Jul 9, 2015
Latest reply on Jul 20, 2015 by phubert
We are trying to understand how many chunks are used for a given file.

1) I create a 1GB file highly compressible since based on zeros
[root@bmx10000 /latency1]# dd if=/dev/zero of=smallfile bs=1024k count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.50531 s, 419 MB/s

2) Now looking from the chuck perspective we see 4 (256MB*4)
[pphubert@bhp60002 ~]$ hadoop mfs -ls /projects/latency1
Found 1 items
-rw-r--r-- Z U   1 root root 1048576000 2015-07-09 09:39  268435456 /projects/latency1/smallfile
               p 2385.37.262522  bhp60002.os.amadeus.net:5660
               0 2406.62.131428  bhp60002.os.amadeus.net:5660
               1 2407.60.262618  bhp60002.os.amadeus.net:5660
               2 2404.61.262588  bhp60002.os.amadeus.net:5660
               3 2405.59.262666  bhp60002.os.amadeus.net:5660

3) But since compression is on we use only 125MB !
[pphubert@bhp60002 ~]$ maprcli volume list -columns volumename,logicalUsed,used
used    logicalUsed  volumename
125     1000         latency1

So then why MapR allocates 4 chunks and not just 1 !

Outcomes