AnsweredAssumed Answered

corruption in the cluster - using gfsck and more

Question asked by dimamah on Aug 10, 2014
Latest reply on Nov 20, 2014 by byrondover
Hi, 
We recently had a corruption in our cluster due to failures of 2 disks, different SPs, different Servers, While replication factor was 2. 
We managed to isolate the corrupted data and want to run `gfsck` to repair the cluster. 
Few questions :

 1. Running the command on the root volume will put it into global-fsck mode which from my understanding brings its offline.  This is correct?
 2. How long should a gfsck take on a cluster of 50TB uncompressed?
 3. Does the -d option also performs the repair part or only prints the output?
 4. Running hadoop fs -du on `/` sums up to ~20TB of data, we are missing 30TB and suspect that its not reclaimed due to corruption, is this true? should gfsck -r reclaim that storage?

2nd issue we have is that one of the volumes seems to be missing and we are unable to recreate it. 
Volume `mapr.osaka-01.local.logs` is unavailable. 
The UI shows this error : `Information for column 'actualreplication' is not yet available for volume 'mapr.osaka-01.local.logs'. Please try again.` 
Tried shutting down warden and removing the volume, can't do that, volume isn't created after restart also.

While warden is down : 

    maprcli volume remove -name mapr.osaka-01.local.logs
    ERROR (2) -  Volume Remove: No such file or directory
    
    maprcli volume remove -name mapr.osaka-01.local.logs -force 1
    ERROR (2) -  Volume Remove: No such file or directory
    
    hadoop fs -mkdir /var/mapr/local/osaka-01/logs
    14/08/10 22:02:24 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    14/08/10 22:02:24 INFO security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
    2014-08-10 22:02:24,7761 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1305 Thread: 139942240532224 Lookup of volume mapr                                        .osaka-01.local.logs failed, error No such file or directory(2) vtype 0 wantMirror 0 , CLDB: 10.20.40.83:7222
    2014-08-10 22:02:24,7794 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1305 Thread: 139942240532224 Lookup of volume mapr                                        .osaka-01.local.logs failed, error No such file or directory(2) vtype 0 wantMirror 0 , CLDB: 10.20.40.83:7222
    2014-08-10 22:02:24,7815 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1305 Thread: 139942240532224 Lookup of volume mapr                                        .osaka-01.local.logs failed, error No such file or directory(2) vtype 0 wantMirror 0 , CLDB: 10.20.40.83:7222
    2014-08-10 22:02:24,7815 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1775 Thread: 139942240532224 mkdirs failed for                                         /var/mapr/local/osaka-01/logs, error 2
    mkdir: Error: No such file or directory(2), file: logs
    
    hadoop fs -mv /var/mapr/local/osaka-01/logs1 /var/mapr/local/osaka-01/logs
    14/08/10 22:02:47 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    14/08/10 22:02:47 INFO security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
    2014-08-10 22:02:48,1785 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1305 Thread: 139656005490432 Lookup of volume mapr                                        .osaka-01.local.logs failed, error No such file or directory(2) vtype 0 wantMirror 0 , CLDB: 10.20.40.83:7222
    mv: Error: Directory not empty
    
    createsystemvolumes.log:
    14/08/10 22:31:14 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    14/08/10 22:31:14 INFO security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
    2014-08-10 22:31:14,6459 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1305 Thread: 140048996775680 Lookup of volume mapr.osaka-01.local.logs failed, error No such file or directory(2) vtype 0 wantMirror 0 , CLDB: 10.20.40.83:7222
    stat: cannot stat `/var/mapr/local/osaka-01/logs': No such file or directory
    14/08/10 22:31:18 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    14/08/10 22:31:18 INFO security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
    2014-08-10 22:31:18,4604 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1305 Thread: 140474492356352 Lookup of volume mapr.osaka-01.local.logs failed, error No such file or directory(2) vtype 0 wantMirror 0 , CLDB: 10.20.40.83:7222
    stat: cannot stat `/var/mapr/local/osaka-01/logs': No such file or directory
    --- Sun Aug 10 22:31:21 IDT 2014 --- Failed to detect if volume mapr.osaka-01.local.logs is mounted




Thanks

Outcomes