AnsweredAssumed Answered

How do the containers / chunks / blocks work

Question asked by dimamah on Jun 4, 2013
Latest reply on Jul 3, 2013 by dimamah
I've read a lot about the containers / chunks in MapR and still I'm missing some crucial information. 
Hope you could answer my questions and please correct me if I'm wrong.

1) **Reading Data** : when the client needs to read a file  

 - It asks the local fileserver service
   for the file.
 - The fileserver accesses the MFS and
   gets the information about which
   container(s) the file chunks are
   stored in and pass them to the client. [not sure about this step]
 - Then the client checks its cache for the locations of those containers, if they are not cached, the CLDB is asked (**directly by the client?**) for the locations of the containers.
 - At this step the client knows all the needed containers and their locations and it asks the relevant node(s) for the file's chunks.
 - At each node the fileserver (**is it?**), after receiving a request containing a container ID and a filename, accesses the container (**how?**) and asks for the relevant chunks


2) **Writing Data** : when the clients need to write data 

 - It asks the local fileserver service to write a chunk
 - The fileserver checks for available containers, if found, it is passed to the client 
   otherwise, the fileserver asks the CLDB for a container :
  - The CLDB can pass to the fileserver an existing container (how does it know if it has space in?)
  - Or create a new container and pass it to the fileserver
  - **What happens now? what does the fileserver before passing the container to the client and how is the new metadata updated and where?** 
  -  **are the above flows accurate?**    
  ---------------

3) **writing data via NFS** : how the containers are updated when writing data via NFS? 

4) What happens if i write a file that is smaller than 256MB? `hadoop mfs -lss` always shows 268435456 under {chunk size} but the {size} is smaller and the size, calculated by multiplying the amount of disk blocks by 8192, is even smaller than that. 
So, my question is - **What really happens on the disk and what each of the above sizes mean?**

5) what is the usage of "disk blocks" from the output of `hadoop mfs -lss`?

Thanks in advance,
Dima.

Outcomes