AnsweredAssumed Answered

What does "No XENIX semaphores available" mean?

Question asked by Rocket4u on Dec 26, 2017
Latest reply on Jan 2, 2018 by cathy

Hi All,

In our M7 cluster, one of container 8488 switched master node which is normal but we see No XENIX semaphores available, could you please explain more about it?

 

From cldb logs, we could see that the master switch happened for container 8488

2017-12-26 01:49:56,570 INFO Containers [RBal]: Container ID:8488 vol:VVVV Servers: XX.XX.XX.XX-BM XX.XX.XX.XX XX.XX.XX.XX-R Epoch:65 Ctx Switching master for Cid 8488 from Storage Pool XXXX to Storage Pool XXXX
2017-12-26 01:49:57,984 INFO Containers [ACR-125]: Container ID:8488 vol:VVVV Servers: XX.XX.XX.XX XX.XX.XX.XX XX.XX.XX.XX-R Epoch:65 Ctx XX.XX.XX.XX master response
2017-12-26 01:50:14,558 INFO Containers [ACR-41]: Container ID:8488 vol:VVVV Servers: XX.XX.XX.XX XX.XX.XX.XX XX.XX.XX.XX Epoch:65 Ctx XX.XX.XX.XX resync response


From mfs log snippets from old master and new master that shows “No XENIX semaphores available” errors.

Old master:
2017-12-26 01:49:57,5241 WARN ServerCommand servercommand.cc:1018 Container 8488, is asked to restore from XX.XX.XX.XX:5660 when it is not stale. Marking it stale now.
2017-12-26 01:50:00,9356 ERROR Replication nodefailure.cc:374 Op failed with No such device (19) on replica FSID 161951689496 10.22.47.248:5660 for operation of type 39 and version 230413709 on container 8488
2017-12-26 01:50:00,9362 INFO Replication nodefailure.cc:272 Reporting failure of XX.XX.XX.XX:5660 to CLDB for 8488. CLDB asked to retry, attempt #1
2017-12-26 01:50:04,4370 ERROR Replication nodefailure.cc:299 CLDB responded with err No XENIX semaphores available.(119) while reporting failure of XX.XX.XX.XX:5660 to CLDB XX.XX.XX.XX:7222 for 8488.
2017-12-26 01:50:04,4370 INFO Replication nodefailure.cc:309 DHL: Reporting failure of XX.XX.XX.XX:5660 for container 8488 took 3501 ms
2017-12-26 01:50:04,4370 ERROR Replication nodefailure.cc:418 CLDB returned error No XENIX semaphores available (119) while reporting failure of node XX.XX.XX.XX:5660 as part of replicating ops for container 8488

New master:
2017-12-26 01:49:56,9817 INFO Replication nodefailure.cc:1235 Container 8488, CLDB asked to become master BM, ifClean=1
2017-12-26 01:49:56,9833 INFO Replication nodefailure.cc:1605 BM Become master completed successfully for container 8488 at txn:230413708-230413708, write:230413708-230413708, snap:12844-12844
2017-12-26 01:50:00,9356 INFO Replication replicateops.cc:3410 Bulk replicated op with 1 ops from XX.XX.XX.XX:5660 with version (230413709) on container (8488) failed on replica with error (19)
2017-12-26 01:50:00,9356 ERROR MapServerDir create.cc:1483 ContainerStat 8488 : GetContainer for update failed 119
2017-12-26 01:50:04,4492 ERROR Replication nodefailure.cc:168 Updating epoch failed with error No XENIX semaphores available.(119) cid(8488). CLDB XX.XX.XX.XX:7222 failed the request.
2017-12-26 01:50:04,4492 INFO Replication nodefailure.cc:176 DHL: UpdateEpoch for container 8488 took 3502 ms
2017-12-26 01:50:07,9839 ERROR Replication nodefailure.cc:168 Updating epoch failed with error No XENIX semaphores available.(119) cid(8488). CLDB XX.XX.XX.XX:7222 failed the request.
2017-12-26 01:50:07,9839 INFO Replication nodefailure.cc:176 DHL: UpdateEpoch for container 8488 took 3504 ms
2017-12-26 01:50:07,9839 ERROR Replication firstwrite.cc:152 First write on container (8488) error (119) in updating epoch
2017-12-26 01:50:11,5465 ERROR Replication nodefailure.cc:168 Updating epoch failed with error No XENIX semaphores available.(119) cid(8488). CLDB XX.XX.XX.XX:7222 failed the request.
2017-12-26 01:50:15,1486 INFO Replication firstwrite.cc:422 First write for spid XXXX,container (8488) - Txn VN Hole from 230413708-231462284, Snap VN hole from 12844-12844 uniq XXXX size 44

Outcomes