AnsweredAssumed Answered

Apache Spark : Prevent executor re-launching after an ExecutorLostFailure

Question asked by whantana on Oct 26, 2017
Latest reply on Oct 26, 2017 by madumoulin



There's a business case that when I submit a #spark application to #YARN (client-mode) a portion of the Tasks at hand should be run exactly once, with no task re-attempts in the presence of failures. If any failure occurs the application (or job) should fail fast and loudly (exceptions & stacktraces).


I found the "spark.task.maxFailures" configuration that prevents any task re-attempts if the value set is 1 and it does help me with my business case. However, there are also some cases which an Executor can get lost during an appplication execution. This might happen for a variety of reasons such as memory exceeding limits, bad local directories in YARN, job preemption by YARN scheduler etc.


Following the executor loss , a new executor is re-spawned in order to run lost's executors tasks. Is there any way, Spark's scheduler won't proceed in re-launching after an executor loss and mark this appplication/job/stage as failed ?
Can this be configured per application ? Should be this configured on YARN's side ? 


Thank you for your time.
Dimitris Bousis