AnsweredAssumed Answered

Making Apache Spark Multi-Tenant

Question asked by mikeengland on Dec 23, 2014
Latest reply on Dec 24, 2014 by mikeengland

Our MapR cluster is currently being used by one team for Spark in Standalone mode. However, we are looking to on-board multiple teams onto this cluster and scale it out horizontally. Currently with MapReduce 1 on MapR you can use node labels to say - if you submit a job to queue x, you can only use node 1, 2 and 3, but submitting a job to queue y you can only use node 4, 5 and 6. I am trying to find an equivalent for Spark Standalone (I understand you can use Spark-on-Yarn with node labels).

I understand that with Spark there is the concept of using the fair scheduler with job pools that you can submit jobs to and specify a minimum share of the cluster. However, if a team submits 10 jobs that is set to utilize the 'max cores' value and uses up all of the cluster memory, then another team's job will be stuck in a queue behind these. Therefore, an option to set max cores and max memory on a pool level would work well for a multi-tenant cluster. Is there anything like this in the Spark project?