AnsweredAssumed Answered

Is there a was to specify jobs/programs (parallelized or not) to run on specific nodes?

Question asked by reedv on Feb 7, 2018
Latest reply on Feb 13, 2018 by MichaelSegel

When running a processes (that may not necessarily be fit for a YARN distributed job, eg. program that simply counts prime numbers), is it possible to choose which hadoop node(s) to run it on? A fuller example of what I have in mind is this:

 

Say I am running a sequential program written in some arbitrary language, say R, that goes from 0 and tries to find as many prime numbers as possible (and cannot be parallelized). The person running the program may be on a slow windows machine. However, say we know that one of the host nodes on the cluster uses a very fast CPU and we would like to let the user run their job on this node instead. Is it possible for us to let them use this node (in a way that can be monitored and quota'ed) simply for its better underlying hardware? Is there some workaround for this? Looking at some of the mapr docs (Label-based Scheduling for YARN Applications , Submitting Jobs and Applications to the Cluster) seems to indicate that this is possible, but I have not worked enough with job submission on our cluster to see it in my own mind. Thanks.

 

Full disclosure, we are want to support users who want to solve many intensive SAS (https://mapr.com/partners/partner/sas-realize-your-big-data-aspirations-mapr-and-sas/) computations (the exact details of which I do not know about).

Outcomes