AnsweredAssumed Answered

crashed CLDB won`t start anymore

Question asked by mvince on May 19, 2016
Latest reply on May 30, 2016 by mvince

Hi

I`m playing around with mapr distro, having 3 node cluster with mapr 5.0. Today my cluster crashed and now it seems I can`t start CLDB anymore. In cldb logs (attached) there are warnings about tmp file for jetty and invalid topology

2016-05-19 13:03:09,106 WARN log [main]: Can't reuse /tmp/Jetty_0_0_0_0_7221_cldb____qb58s0, using /tmp/Jetty_0_0_0_0_7221_cldb____qb58s0_2134555231849579909
2016-05-19 13:03:09,257 INFO log [main]: Started SelectChannelConnector@0.0.0.0:7221
2016-05-19 13:03:09,772 INFO CLDBServer [RPC-2]: FSRegister: Request  FSID: 5457457659252197970 FSNetworkLocation:  FSHost:Port: 10.0.3.103- FSHostName: dwh-mapr-dev-01 StoragePools 8d64ee892f3f8f2000573b60ce0c3e66-0b3bd35fd7c1770900573c0d1607c017- Capacity: 681560 Available: 460799 Used: 220760 Role: 0 isDCA: false uniq: 3b15339b6a280a3a-573d98e50ac340 Received registration request
2016-05-19 13:03:09,773 INFO CLDBServer [RPC-2]: Cluster uuid is -5976214098725966258-1982579319679765506
2016-05-19 13:03:09,773 WARN Topology [RPC-2]: FileSever on dwh-mapr-dev-01 reported an invalid topology . Ignoring reported topology








and dies after 7 minutes because local mfs did not became master

2016-05-19 13:10:12,252 INFO CLDBServer [Lookup-1]: Rejecting RPC 2345.5 from 10.0.3.104:47253 with status 3 as CLDB is waiting for local kvstore to become master.
2016-05-19 13:11:12,254 INFO CLDBServer [Lookup-4]: Rejecting RPC 2345.5 from 10.0.3.104:47253 with status 3 as CLDB is waiting for local kvstore to become master.
2016-05-19 13:12:12,932 INFO CLDBServer [Lookup-2]: Rejecting RPC 2345.5 from 10.0.3.103:55115 with status 3 as CLDB is waiting for local kvstore to become master.
2016-05-19 13:13:13,651 INFO CLDBServer [Lookup-8]: Rejecting RPC 2345.5 from 10.0.3.104:1111 with status 3 as CLDB is waiting for local kvstore to become master.
2016-05-19 13:14:14,086 INFO CLDBServer [Lookup-8]: Rejecting RPC 2345.5 from 10.0.3.104:44898 with status 3 as CLDB is waiting for local kvstore to become master.
2016-05-19 13:15:07,446 FATAL CLDB [WaitForLocalKvstore Thread]: CLDBShutdown: CLDB had master lock and was waiting for its local mfs to become Master.Waited for 7 (minutes) but mfs did not become Master. Shutting down CLDB to release master lock.








 

I can`t find anything useful in mfs logs so I`m pretty much stucked about what happend and how to fix it so I`ll be gratefull for every idea

 

thanks

 

EDIT:

I`m digging through logs and finally found something interesting in mfsinit.log -- looks like there are errors mounting one of disks

Disabling disk cache and set max io size on mapr disks

/opt/mapr/server/maprexecute Disabling the disk cache on mapr disk: /dev/vdc
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(setcache) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device

/dev/vdc:
 setting drive write-caching to 0 (off)
unable to access /dev/vdc, ATA disk?
Disabling the disk cache on mapr disk: /dev/vdc: Failed
Setting max_sectors_kb for mapr disk: vdc
Set max_sectors_kb to 1024 on mapr disk: vdc
/opt/mapr/server/maprexecute Disabling the disk cache on mapr disk: /dev/vde
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(setcache) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device

 

EDIT2:

when I try to run fsck on that disk it gets killed instantly

 

mapr@dwh-mapr-dev-01:/opt/mapr/zkdata$ /opt/mapr/server/mrconfig sp list
ListSPs resp: status 0:2
No. of SPs (2), totalsize 716799 MB, totalfree 0 MB

SP 0: name SP6, Online, size 204799 MB, free 0 MB, path /dev/vdc
SP 1: name SP5, Online, size 511999 MB, free 0 MB, path /dev/vde
mapr@dwh-mapr-dev-01:/opt/mapr/zkdata$ /opt/mapr/server/mrconfig sp offline all
mapr@dwh-mapr-dev-01:/opt/mapr/zkdata$ /opt/mapr/server/fsck /dev/vdc 
Using logfile /opt/mapr/logs/fsck.log.2016-05-19.18:45:22.21065
tcmalloc: large alloc 11245977600 bytes == 0x3730000 @  0x8d3140 0x7914d7
tcmalloc: large alloc 11245985792 bytes == 0x2a24c0000 @  0x8d2def 0x754b57
Killed

Attachments

Outcomes