AnsweredAssumed Answered

mfs core dump when making a particular path

Question asked by wei_dong on Jul 24, 2013
Latest reply on Jul 24, 2013 by nabeel
Hi Everyone,

My MapR version is v. 2.1.2.18401.GA.

I've been experiencing repeatable mfs crashes when trying to make a directory with a particular name.  Putting data or Making directory of other names are all successful -- that is, after I restarting the mfs service on the culprit node.  Below is my command line output:

<pre>
wdong@washtenaw:/mnt$ hadoop fs -mkdir /user/wdong/ktv2/国è¯æµè¡œDVD/S.H.E_周定纬2013-07-24 16:29:15,1225 ERROR Client fs/client/fileclient/cc/client.cc:3439 Thread: 140595574146816 rpc err Connection reset by peer(104) 28.124 to 192.168.100.7:5660, fid 2055.9312.117878, upd 1, failed err 17
2013-07-24 16:30:53,8551 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:451 Thread: 140595574146816 ContainerLookup failed, No master found for cid 2055, CLDB:
2013-07-24 16:30:53,8552 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:508 Thread: 140595574146816 GetBinding failed, could not allocate entry for cid 2055, in cidcache
2013-07-24 16:30:53,8552 ERROR Client fs/client/fileclient/cc/client.cc:989 Thread: 140595574146816 Rpc failed, 28.12, no server found for cid 2055
2013-07-24 16:30:53,8568 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:464 Thread: 140595574146816 ContainerLookup failed, cldb returned empty list, no servers found for cid 2055, CLDB: 192.168.100.6:7222
2013-07-24 16:30:53,8569 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:508 Thread: 140595574146816 GetBinding failed, could not allocate entry for cid 2055, in cidcache
2013-07-24 16:30:53,8569 ERROR Client fs/client/fileclient/cc/client.cc:1523 Thread: 140595574146816 Rpc failed, 28.21, no server found for cid 2055
2013-07-24 16:30:53,8569 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1636 Thread: 140595574146816 mkdirs failed for /user/wdong/ktv2/国è¯æµè¡œDVD/S.H.E_周定纬, error 2
mkdir: Error: No such file or directory(2), file: S.H.E_周定纬
</pre>

I tried to GDB the core file.  There are 6 threads with 5 blocking and the 6th generating the following stack trace:

<pre>
Core was generated by `/opt/mapr/server/mfs -b -p 5660 -m 3190 -O /opt/mapr/conf/mapr-clusters.conf -i'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f36245d33ec in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f36245d33ec in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x000000000073f557 in Match (ext=<optimized out>, this=<optimized out>, len=<optimized out>)
    at fs/server/mapserver/mapserver.h:167
#2  CheckExtension (len=<optimized out>, extension=<optimized out>, this=<optimized out>) at fs/server/mapserver/mapserver.h:229
#3  mapr::fs::MapServer::CheckInNoCompressList (this=<optimized out>, fileName=<optimized out>, nameLength=<optimized out>)
    at fs/server/mapserver/mapserver.cc:245
#4  0x0000000000788cfd in mapr::fs::MapServer::CreateCheckParent (arg=0x2d7ed10, err=<optimized out>)
    at fs/server/mapserver/create.cc:906
#5  0x0000000000657b02 in mapr::fs::CacheMgr::Read (this=0x277cd40, wa=0x2d7ed20) at fs/server/cache/cachemgr.cc:880
#6  0x0000000000783f51 in mapr::fs::MapServer::CreateGetParentInode (arg=<optimized out>, err=<optimized out>)
    at fs/server/mapserver/create.cc:847
#7  0x0000000000784a69 in FidLock (wa=<optimized out>, cbarg=<optimized out>, cb=<optimized out>, isTry=<optimized out>,
    lockMode=<optimized out>, inumber=<optimized out>, this=<optimized out>) at fs/server/mapserver/locks.h:57
#8  mapr::fs::MapServer::CreateLockParent (wa=<optimized out>) at fs/server/mapserver/create.cc:830
#9  0x0000000000788e4e in mapr::fs::MapServer::CreateCowInodesDone (arg=0x0, err=73072812) at fs/server/mapserver/create.cc:813
#10 0x000000000070d1a0 in mapr::fs::Container::CowInodeNext (arg=0x45b00b0, err=67156064) at fs/server/container/cow.cc:67
#11 0x0000000000784093 in mapr::fs::MapServer::CreateCowInodes (arg=0x2d7ed10, err=<optimized out>)
    at fs/server/mapserver/create.cc:793
#12 0x00000000006e6f07 in mapr::fs::Container::IReserve (this=0x396eba0, agroupp=<optimized out>,
    cb=0x783f90 <mapr::fs::MapServer::CreateCowInodes(void*, int)>, cbarg=0x2d7ed10, wa=<optimized out>)
    at fs/server/container/container.cc:152
#13 0x0000000000787cef in mapr::fs::MapServer::CreateReserveChildInode (arg=0x2d7ed10, err=0) at fs/server/mapserver/create.cc:745
#14 0x00000000007882d7 in mapr::fs::MapServer::CreateReserveOrphanSlot (arg=0x2d7ed10, err=0) at fs/server/mapserver/create.cc:721
#15 0x0000000000745c9d in mapr::fs::ContainerTable::GetContainerFirstWriteDone (arg=0x2d7ed20, err=0)
    at fs/server/mapserver/ctable.cc:477
#16 0x00000000007461af in mapr::fs::ContainerTable::GetContainerFirstWrite (arg=0x2d7ed20, err=0) at fs/server/mapserver/ctable.cc:457
#17 0x00000000006e9260 in mapr::fs::Container::InitOnFirstUpdate (this=0x396eba0,
    cb=0x746120 <mapr::fs::ContainerTable::GetContainerFirstWrite(void*, int)>, cbarg=0x2d7ed20, wa=0x2c)
    at fs/server/container/container.cc:2235
#18 0x0000000000746068 in mapr::fs::ContainerTable::GetContainerUpdate (arg=0x2d7ed20, err=0) at fs/server/mapserver/ctable.cc:430
#19 0x00000000006e9a50 in mapr::fs::Container::InitOnFirstAccess (this=0x396eba0,
    cb=0x745fc0 <mapr::fs::ContainerTable::GetContainerUpdate(void*, int)>, cbarg=0x2d7ed20, wa=0x2c)
    at fs/server/container/container.cc:2048
#20 0x0000000000789d3f in mapr::fs::MapServer::DoCreate (this=0x2784670, cid=<optimized out>, parent=<optimized out>,
    verify=<optimized out>, verifyOffset=<optimized out>, inChild=<optimized out>, isDanglingChild=false,
    fname=0x45b00a8 "S.H.E_周定纬", nmlen=15, linkname=0x0, linknmlen=0, isWeakVolLink=false, sattr=0x45b09a0, vn=0, creds=
    0x45acbd0, childType=2, childSubType=mapr::fs::FSTInval, major=0, minor=0, retChild=0x2d7ed04, dontTakeParentFidLock=false,
    fromFsck=172, fromGfsck=11, isSetUidEnabled=142, preParentAttr=0x45acad0, postParentAttr=0x45b1680, childAttr=0x45b1220,
    containerSizep=0x0, cb=0x888430 <mapr::fs::FileServer::MkdirRespond(void*, int)>, cbarg=0x2d7ec70, wa=0x2d7ed10)
    at fs/server/mapserver/create.cc:641
#21 0x000000000078ab81 in mapr::fs::MapServer::Mkdir (this=<optimized out>, parent=0x2d7ecf8, inChild=<optimized out>, dname=0x0,
    nmlen=15, sattr=0x45b09a0, vn=0, creds=0x45acbd0, needsReplication=<optimized out>, onReplica=false, upstreamFs=0,
    fromGfsck=<optimized out>, reqProcId=124, retChild=0x2d7ed04, preParentAttr=0x45acad0, postParentAttr=0x45b1680,
    childAttr=0x45b1220, containerSizep=0x0, cb=0x888430 <mapr::fs::FileServer::MkdirRespond(void*, int)>, cbarg=0x2d7ec70,
    wa=0x2d7ed10) at fs/server/mapserver/create.cc:140
#22 0x000000000089b390 in mapr::fs::FileServer::MkdirServe (this=<optimized out>, handle=<optimized out>, reqProcId=124,
    hdrLen=<optimized out>, hdr=<optimized out>) at fs/server/mapserver/fileserver.cc:1784
#23 0x000000000089cb51 in mapr::fs::FileServer::RequestArrivedCommon (this=0x7fff26d4fc80, r=0x7f355c023110, ctx=0x7f355c033750,
    procedureId=<optimized out>, hdrLen=74, hdr=0x7f355c03389a) at fs/server/mapserver/fileserver.cc:243
#24 0x0000000000640231 in mapr::fs::DispatchThreaded::Dispatch (this=0xc865c0) at fs/rpc/dispatchThr.cc:104
#25 0x0000000000645418 in mapr::fs::RpcServer::Run (this=0x7fff26d770f0, forever=<optimized out>) at fs/rpc/rpcserver-epoll.cc:270
#26 0x000000000087b618 in main (argc=<optimized out>, argv=<optimized out>) at fs/server/mapserver/mapfs.cc:975
</pre>

Another different stack trace I've seen is

<pre>
Core was generated by `/opt/mapr/server/mfs -b -p 5660 -m 3190 -O /opt/mapr/conf/mapr-clusters.conf -i'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f93800463ec in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f93800463ec in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x000000000073f557 in Match (ext=<optimized out>, this=<optimized out>, len=<optimized out>)
    at fs/server/mapserver/mapserver.h:167
#2  CheckExtension (len=<optimized out>, extension=<optimized out>, this=<optimized out>) at fs/server/mapserver/mapserver.h:229
#3  mapr::fs::MapServer::CheckInNoCompressList (this=<optimized out>, fileName=<optimized out>, nameLength=<optimized out>)
    at fs/server/mapserver/mapserver.cc:245
#4  0x0000000000788cfd in mapr::fs::MapServer::CreateCheckParent (arg=0x17f7d70, err=<optimized out>)
    at fs/server/mapserver/create.cc:906
#5  0x0000000000657b02 in mapr::fs::CacheMgr::Read (this=0x10a2d40, wa=0x17f7d80) at fs/server/cache/cachemgr.cc:880
#6  0x0000000000659833 in mapr::fs::CacheMgr::ProcessCacheRequest (arg=0x24d622c, err=<optimized out>)
    at fs/server/cache/cachemgr.cc:2170
#7  0x0000000000640231 in mapr::fs::DispatchThreaded::Dispatch (this=0xc865c0) at fs/rpc/dispatchThr.cc:104
#8  0x0000000000645418 in mapr::fs::RpcServer::Run (this=0x7fffc27b9270, forever=<optimized out>) at fs/rpc/rpcserver-epoll.cc:270
#9  0x000000000087b618 in main (argc=<optimized out>, argv=<optimized out>) at fs/server/mapserver/mapfs.cc:975
</pre>

I'm wondering if anyone has a clue of what's happening.  I can provide more information or the core files if needed.

Thank you.

- Wei

Outcomes