I like to understand Spark's partition decision when parallel applications running.
Each stage contains multiple tasks and task will run on a partition, so I am assuming number of tasks = number of partitions and each task requires a CPU core of cluster(Please correct me if my understanding is wrong) so if parallel applications running in my cluster, will spark do partition based on available CPU cores since CPU utilisation will differ based on number of applications running.?
Also when can I use repartition () and coalesce () ? What are the scenarios where spark's default partition are not ideal