AnsweredAssumed Answered

How to Recover from Failed OS drives when cluster drives are solid

Question asked by mandoskippy on Apr 4, 2013
Latest reply on Apr 8, 2013 by mandoskippy
I have an odd situation.  I have cluster that has 8 nodes, it's a bit odd, but we call it frankencluster. We had an issue with Franken Cluster. 6 of the nodes on frankencluster dropped this weekend, all had OS failures.  three are likely recoverable (I can get them back up and running).  Three OS disks are gone, kaput (corrupt) Here's the fun part: Mapr was shutdown at the time!  There was a clean shutdown and I haven't tried to start it back up.  The topology is below.  Basically, I'd like to know that for node 5 and 6, is it possibly to take those disks and assign them to a different machine and have mapr be able to use them? Specifically to rereplicate containers so I don't lose data? I guess in looking at things, The only "potential" for issue is any volume set to a replication factor of 2 that had both copies on Node 5 and 6. Trouble is, I don't know if that's the case wihtout starting MapR and will it freak out at that point (the degree I wouldn't be able to recover?   This is a weird situation, I'll grant that, but based on what I know of MapR this is not impossible... thoughts?

Rack1 (SSD Disks) - (don't care about these volumes, except CLDB)
 - Node 1  (two SSD Drives, may be CLDB) Fine No issues here
 - Node 2  (1 SSD  Drive - Gone)
 - Node 3 (1 SSD Drive - OS likely recoverable)

Rack 2 (Spinny Disks)  - ( I do care about the data here)
 - Node 4 (4 Large drives) Fine No issues here (may be CLDB)
 - Node 5 (1 Drive) OS Data Gone (Mapr Drive OK)
 - Node 6 (1 Drive) OS Data Gone (Mapr Drive OK)
 - Node 7 (1 Drive) OS OK Mapr OK
 - Node 8 (1 Drive) OS OK Mapr OK

Outcomes