AnsweredAssumed Answered

capacity-scheduler configuration

Question asked by Karthee on Nov 3, 2017
Latest reply on Nov 9, 2017 by cathy

Hi Team,

 

I have 7 node MapR - 5.2.2 cluster with below config's

 

CPU: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (16 physical cores)(32 hyper threaded) 
RAM:256GB -- (1.7 TB Total)
DISK:1.2TB x 13 -- (95 TB Total)

 

And, here are some properties from my yarn-site.xml:

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>204800</value>
</property>

<property>
<name> yarn.scheduler.maximum-allocation-mb</name>
<value>204800</value>
</property>

<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>

 

here are some properties in capacity-scheduler.xml :

<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>

 

here are some properties in mapred-site.xml :

 

<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>

<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>

 

am running 6.8 TB dataset hive job with mr engine, it spawned with 26005 mapper tasks and 512 reducer tasks running with 182 containers for mapper task and 60 containers for reducer task.

 

my question is,

 

1.why the mr job runs with 100% disk utilized and 1 TB of memory left unused when capacity-scheduler is configured with "DominantResourceCalculator" !!!???  Please find the attached screenshot.

 

2. How to increase more than 60 containers for Reduce task ???

 

3. when couple of jobs runs concurrently, performance is totally degraded ! How to sort this out???

 

Please help me to sort this out issues.

 

 

Thanks in advance,

Karthi

Attachments

Outcomes