AnsweredAssumed Answered

Spark's partition decision when parallel applications running?

Question asked by Velumani on Jul 13, 2016
Latest reply on Jul 28, 2016 by Velumani

Hi,

     I like to understand  Spark's partition decision when parallel applications running.

 

     Each stage contains multiple tasks and task will run on a partition, so I am assuming number of tasks = number of partitions and each task requires a CPU core of cluster(Please correct me if my understanding is wrong) so if parallel applications running in my cluster, will spark do partition based on available CPU cores since CPU utilisation will differ based on number of applications running.?

 

     Also when can I use repartition () and coalesce () ? What are the scenarios where spark's default partition are not ideal

Outcomes