AnsweredAssumed Answered

Mapr Client errors while writing to MapR Streams

Question asked by ani.desh1512 on Oct 30, 2017
Latest reply on Nov 3, 2017 by ani.desh1512

We have the following setup:

  • 5 node Mapr cluster [5.2.2] with 3 cldb
  • the disk balancer and role balancer is enabled on the mapr cluster
  • We have 7 clients setup with mapr-client [ 5.2.2.44680.GA-1] and mapr-kafka [0.9.0.201707250007-1]. 
  • We have written a service (which uses mapr streams java api library) which, using these clients, provide a REST API for posting messages to mapr streams
  • This service is capable of handling quite a high number of incoming messages (around 5k messages per second)

 

While these clients were handling and posting these messages to mapr streams, we saw the following errors on client side:

 

Oct 28 13:52:41 java[21692]: 2017-10-28 13:52:41,6290 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27231 rpc err No such process(3) 35.1 to 10.101.16.180:5660, fid 2182.20066.176702, upd 1
Oct 28 16:38:04 java[21692]: 2017-10-28 16:38:04,7506 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27185 rpc err No such device(19) 35.1 to 10.101.19.217:5660, fid 2176.7799.174220, upd 1
Oct 28 20:37:38 java[21692]: 2017-10-28 20:37:38,3487 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27227 rpc err No such device(19) 35.1 to 10.101.19.217:5660, fid 2206.24035.184472, upd 1
Oct 29 03:31:48 java[21692]: 2017-10-29 03:31:48,9432 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27227 rpc err No such device(19) 35.1 to 10.101.19.64:5692, fid 2169.32.131202, upd 1
Oct 29 06:11:14 java[21692]: 2017-10-29 06:11:14,8343 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27207 rpc err No such device(19) 35.1 to 10.101.18.70:5660, fid 2178.21279.173770, upd 1
Oct 29 09:15:43 java[21692]: 2017-10-29 09:15:43,2127 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27231 rpc err No such device(19) 35.1 to 10.101.16.180:5692, fid 2177.11943.175214, upd 1
Oct 29 09:15:46 java[21692]: 2017-10-29 09:15:46,2132 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27187 rpc err No such process(3) 35.1 to 10.101.19.64:5660, fid 2178.21279.173770, upd 1
Oct 29 12:15:11 java[21692]: 2017-10-29 12:15:11,4060 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27206 rpc err No such device(19) 35.1 to 10.101.19.206:5660, fid 2198.23551.309544, upd 1
Oct 29 13:38:26 java[21692]: 2017-10-29 13:38:26,7389 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27200 rpc err No such device(19) 35.1 to 10.101.16.180:5692, fid 2201.20153.177076, upd 1
Oct 29 14:00:26 java[21692]: 2017-10-29 14:00:26,1180 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27224 rpc err No such process(3) 35.1 to 10.101.19.64:5692, fid 2177.11943.175214, upd 1
Oct 29 17:32:46 java[21692]: 2017-10-29 17:32:46,6226 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27187 rpc err No such process(3) 35.1 to 10.101.19.64:5692, fid 2183.28375.188194, upd 1
Oct 29 17:40:13 java[21692]: 2017-10-29 17:40:13,9224 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27216 rpc err No such process(3) 35.1 to 10.101.19.206:5660, fid 2178.21279.173770, upd 1
Oct 29 18:04:35 java[21692]: 2017-10-29 18:04:35,8941 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27198 rpc err No such device(19) 35.1 to 10.101.19.217:5660, fid 2328.14216.159754, upd 1
Oct 29 21:05:41 java[21692]: 2017-10-29 21:05:41,6441 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27185 rpc err No such device(19) 35.1 to 10.101.19.64:5692, fid 2201.20153.177076, upd 1
Oct 29 21:05:41 java[21692]: 2017-10-29 21:05:41,6442 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27237 rpc err No such device(19) 35.1 to 10.101.18.70:5692, fid 2177.11943.175214, upd 1
Oct 29 23:55:40 java[21692]: 2017-10-29 23:55:40,9387 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27204 rpc err No such device(19) 35.1 to 10.101.16.180:5660, fid 2182.20066.176702, upd 1
Oct 30 02:56:16 java[21692]: 2017-10-30 02:56:16,6590 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27231 rpc err No such process(3) 35.1 to 10.101.18.70:5692, fid 2179.17836.454986, upd 1
Oct 30 02:56:16 java[21692]: 2017-10-30 02:56:16,6591 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27185 rpc err No such process(3) 35.1 to 10.101.18.70:5692, fid 2179.17836.454986, upd 1
Oct 30 04:22:02 java[21692]: 2017-10-30 04:22:02,0870 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27229 rpc err No such process(3) 35.1 to 10.101.19.64:5692, fid 2179.17836.454986, upd 1
Oct 30 04:55:19 java[21692]: 2017-10-30 04:55:19,9189 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 27228 rpc err No such device(19) 35.1 to 10.101.19.217:5692, fid 2331.24701.180872, upd 1
Oct 30 06:19:50 java[21692]: 2017-10-30 06:19:50,6669 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 15996 rpc err No such process(3) 35.3 to 10.101.16.180:5660, fid 2166.32.131268, upd 1
Oct 30 06:19:50 java[21692]: 2017-10-30 06:19:50,6669 ERROR Client fs/client/fileclient/cc/client.cc:6513 Thread: 15705 rpc err No such process(3) 35.3 to 10.101.16.180:5660, fid 2166.32.131268, upd 1

 

When I checked mfs.log-5 on the cluster, I saw some errors for time roughly corresponding to above timestamp. The errors on mapr server were as follows:

 

2017-10-29 13:38:26,7071 ERROR DB db/localreq.cc:177 resp error 19 for proc 64 tablet 2201.20153.177076, set error on tablet
2017-10-29 13:38:26,7071 ERROR DB db/tablet.cc:478 TabletLoad 2201.20153.177076 : ContainerStatProc failed 19
2017-10-29 13:38:26,7071 INFO DB db/tablet.cc:4129 Unloading tablet 2201.20153.177076 with error 19
2017-10-29 13:38:26,7326 ERROR DB db/localreq.cc:177 resp error 19 for proc 64 tablet 2201.20153.177076, set error on tablet
2017-10-29 13:38:26,7327 ERROR DB db/tablet.cc:478 TabletLoad 2201.20153.177076 : ContainerStatProc failed 19
2017-10-29 13:38:26,7327 INFO DB db/tablet.cc:4129 Unloading tablet 2201.20153.177076 with error 19
2017-10-29 13:38:26,7343 ERROR DB db/localreq.cc:177 resp error 19 for proc 64 tablet 2201.20153.177076, set error on tablet
2017-10-29 13:38:26,7343 ERROR DB db/tablet.cc:478 TabletLoad 2201.20153.177076 : ContainerStatProc failed 19
2017-10-29 13:38:26,7343 INFO DB db/tablet.cc:4129 Unloading tablet 2201.20153.177076 with error 19
2017-10-29 13:38:26,7363 ERROR DB db/localreq.cc:177 resp error 19 for proc 64 tablet 2201.20153.177076, set error on tablet
2017-10-29 13:38:26,7363 ERROR DB db/tablet.cc:478 TabletLoad 2201.20153.177076 : ContainerStatProc failed 19
2017-10-29 13:38:26,7364 INFO DB db/tablet.cc:4129 Unloading tablet 2201.20153.177076 with error 19
2017-10-29 13:38:26,7400 ERROR DB db/localreq.cc:177 resp error 19 for proc 64 tablet 2201.20153.177076, set error on tablet
2017-10-29 13:38:26,7400 ERROR DB db/tablet.cc:478 TabletLoad 2201.20153.177076 : ContainerStatProc failed 19
2017-10-29 13:38:26,7400 INFO DB db/tablet.cc:4129 Unloading tablet 2201.20153.177076 with error 19
2017-10-29 13:38:26,7505 ERROR DB db/localreq.cc:177 resp error 19 for proc 64 tablet 2201.20153.177076, set error on tablet
2017-10-29 13:38:26,7506 ERROR DB db/tablet.cc:478 TabletLoad 2201.20153.177076 : ContainerStatProc failed 19
2017-10-29 13:38:26,7506 INFO DB db/tablet.cc:4129 Unloading tablet 2201.20153.177076 with error 19

 

 

So, I wanted to ask, what are these errors exactly? Why are we seeing them? My conjecture at this time is these errors *might* be caused by disk balancing, since we see these errors only periodically. 

Thanks in advance

Outcomes