We have a cluster of 4 nodes with the characteristics above
Spark jobs make a lot of times in processing, how could we optimize this time, knowing that our jobs run from RStudio and we still have a lot of memory not utilized.
Can you please look into below MapR Doc , see if it answers your query .
Best Practices for YARN Resource Management | MapR
This doc can be helpful to understand on estimating memory and CPU for Spark jobs.
Spark Troubleshooting guide: Tuning Spark: Estimating Memory and CPU utilization for Spark jobs
You can increase spark.executor.memory (the amount of heap of spark executor), spark.executor.cores (how much tasks concurrently executor can run) / spark.executor.instances (the amount of spark executors per application)
The documentation regarding these properties is available here - Spark Configuration.
Thank you Artur Sukhenko .
Retrieving data ...