Failed to connect to external shuffle server

Question asked by ezamboursky on Dec 8, 2017
Latest reply on Dec 8, 2017 by cathy

Hello all!


Recently we'd got a problem with "connecting to extenal shuffle service"

Here is some logs:

nodeXX.XXXXX.local_35758:17/11/16 09:16:03 ERROR storage.BlockManager: Failed to connect to external shuffle server, will retry 2 more times after waiting 5 seconds...
nodeXX.XXXXX.local_35758:17/11/16 09:16:08 ERROR client.TransportClient: Failed to send RPC XXXX64831619992XXXX to nodeXX.XXXXX.local/ Broken pipe


But in the same time service on port 7337 was alive and recieved connection through telnet from another node.

What is can be the reason of this problem?


P.s.: how we can connect to external shuffle service from spark-shell to launch some test for monitor availability of previously mentioned? Is it possible?


Thank you!