If i a container id, how i list all the fids associated to that container?
I checked in with engineering - who verified that there is no easy way to do this. The fsck utility can do this, but requires that the storage pool be offline, and it will not produce a simple list of FIDs (you'd have to turn on debug logging and parse the output).
Are you seeing problems related to the unavailable container that need to be resolved? Everything is replicated and self-healing, so an unavailable container should not result in data loss.
When a file is written, it's sharded into chunks, then chunks are written to contains and replicated. So..any file larger than the chunk size (256 MB, by default) will have different pieces in different containers.
What are you trying to do?
thanks Deborah Littlefield
i am trying to find out what pieces are there in a container by fid.
There is a maprcli command that lets you get information on components that are identified by a fid: maprcli fid dump and maprcli fid stat: https://maprdocs.mapr.com/52/ReferenceGuide/fid.html
That might get you what you want.
Thanks Deborah Littlefield
I need to know the fid to do maprcli fid stat and dump. which is what i am trying to find out.
This command will list the FID of a specific file:
hadoop mfs -ls <path to file>
If you want more information about the file, you can use:
hadoop mfs -stat <path to file>
If you need this for many files, you could write a script to get them all. I'm not aware of an easier way (but there might be one...).
Deborah Littlefield ..all i have is a container id, with container id i am trying to get the fid so that i can identify the file.
I'm not sure of an easy way to do what you're asking, though I will see if I can figure it out.
I'm not trying to be annoying...but I'm afraid if we find the answer to your question, you'll just end up with another question. For example - it's likely that the container ID you have contains parts of many different FIDs. If you get a list of those FIDs, will you be able to figure out which FID you're interested in? And if you do find the FID, will the fid dump or fid stat command tell you what you want to know?
it would help us to understand the ultimate goal, there might be an easier way to get there.
Yes it will.. we recently had a container unavailable, we want to see what files are impacted by this.
Understood - thanks.
Here is one way to do it - if you have billions of files, this isn't going to be the way to go. But it will work if it's manageable.
When you use hadoop mfs -ls <path>, the output show a three-part FID. The first part is the ID of the container where the file is stored. The second part is the inode of the file itself, and the third part is an internal version number.
For example, in this output:
The value 2049.73.10495598 indicates container ID 2049. Since you have the container ID, you could grep for it:
hadoop mfs -lsr / | grep <ID>
In the above, -lsr lists recursively - so you may not want to do that if you have billions of files (you might want to break it down by subdirectories).
I can keep looking for a better idea (or maybe someone who knows a more direct approach can chime in).
I actually thought about this.. but i have billion files on my cluster.. which will be very hard route.
Yes - definitely a hard route with that many files. I'll keep digging.
Did you lose all three copies of this container? If you only lost one copy of the container, it should be automatically rebuilt from the replicas, and not impact the files at all. Assuming your replication factor is higher than 1.
Other available containers had lower epoch... wanted to see if there was a dataloss and what files are impacted.. if we were to promote a lower epoch container as master.
I think this will help... thanks!
Based on your description, listing all FIDs in a container is unlikely to be of any help.
If you use a command like "maprcli dump containerinfo -ids <#> -json", the output will contain the latest epoch for the container and the list of servers containing replicas of that container their epochs at the time they were last reported to the CLDB service. If a container epoch is X, and you have replicas at epoch X but they are not active, there is some sort of problem bringing those container replicas online. That could be, for instance, because the storage pool/disks on which the replica is stored are offline (for instance, due to IO errors from one of those disks). Or, perhaps, the entire node is offline.
If the node was available, and the disks storing the most recent replica were available, then the CLDB would tell MFS on that node to bring it online. If the container couldn't be brought online, you should troubleshoot that issue, as opposed to reverting to an older/out-of-date replica of the container.
I did get back one more piece of useful information for you. There is a way to list all the directory FIDs for a container, if that would help:
/opt/mapr/server/mrdirectorystats -cid <container ID>
I'm on the road at the moment and haven't had a chance to test it, so try it at your own risk...
We tested.. and this works only for namespace container.. returns nothing for data containers.
Retrieving data ...