Troubleshooting JobTracker Slowness Issues on MRV1

Document created by sumesh_kurup on Feb 17, 2016
Version 1Show Document
  • View in full screen mode

Author: Sumesh Kurup

Original Publication Date: February 17, 2016

 

Environment:

Mapr 3.x, 4.x

Hadoop 0.20.x

 

Symptom: General behavior exhibited is that JobTracker is slow when there are huge number of Jobs running. Another known symptom is when logging into the JobTracker UI, the UI will take a very long time to refresh the page to show the job status. Also another known symptom is while running a "hadoop job -list", it takes longer time than expected to give the output for the jobs which are running on the cluster. This is usually observed on a busy cluster (usually a large cluster which 100+ nodes and with large number of jobs running concurrently)

 

Analysis: The core issue here is how the global Jobtracker lock is handled by the JobTracker (JT) process. To load the main JT page, the JT takes takes control of the global JT lock as it reads info about running jobs from various internal data structures.  The more jobs that are running, more times the page is requested, the longer the lock is held. This results in more waiting, both for users attempting to access the GUI page and for things like submitting new jobs and getting/updating the job stats, etc.

 

To isolate this issue and to gather diagnostics the issue, the following steps need to be performed .

1. Collect regular stack traces for the JobTracker process at regular intervals .

2. Enable DEBUG logging on the jobtracker end.

3. Collect the jobtracker logs around the time interval window

 

For collecting the stack traces of the JT pid, we can invoke this via a small shell script which will run stack trace dump of the JT process at regular intervals. The script would be something like shown below:

------ JT stack script ------

#!/bin/bashif [ $# -eq 0 ]; then   

     echo >&2 "Usage: jstackSeries <pid> <run_user> [ <count> [ <delay> ] ]"   

     echo >&2 "    Defaults: count = 10, delay = 0.5 (seconds)"   

     exit 1

fi

pid=$1          # required

user=$2         # required

count=${3:-10}  # defaults to 10 times

delay=${4:-0.5} # defaults to 0.5 seconds

while [ $count -gt 0 ]

do   

     sudo -u $user jstack -l $pid >jstack.$pid.$(date +%H%M%S.%N)   

     sleep $delay   

     let count--   

     echo -n "."

done

---------

Once done you can run the script as below:

 

# sh jstackSeries.sh [pid] [user] [count] [delay]

Here the user can be the admin user ( for example "mapr" which manages the cluster)

 

For example:

[root] yarn-fcs-1 ~]# sh /tmp/jstack.sh 22339 mapr 10 1

Here I am running the script as "mapr" user for the JT PID "22339" and running 10 iterations with every 1 sec delay.

 

On one of the  issues that we had worked upon with the customer, the issue was isolated to the "getMapCounters" and "getReduceCounters" method calls which causes slowness because of waiting for the lock on ResourceBundles. Below would be a detailed information around this issue and the changes that were updated from our end to effectively address the problem.

 

What is ResourceBundles in JobTracker?

When a job is running with multiple map / reduce tasks, each task has by default 14 Task Counters. For each period, tasks will update its task counters. In Parallel, when JT UI page is refreshed, for each job, for each task in job, for each 14 Task counters, UI needs to read the current value. ResourceBundles is the in-memory key-value data structure which has all the task counters for all the running tasks of all running jobs. Any read on a task counter by UI or any write on a task counter by running task needs a lock on the entire ResourceBundles.

 

Example:  If 100 jobs are running, each with 1000 tasks, each with 14 task counters, a single UI refresh needs 100 * 1000 * 14 = 1400000 locks on ResourceBundles is needed.

 

Why Task Counters is needed for JT UI?

JT UI front page shows Cumulative Map/Reduce CPU and Cumulative Map/Reduce Memory for each job, to calculate it. UI need to get CPU Counter Value / Memory Counter Value for all tasks running for that job, and does summing. So for showing two cpu and memory information for each jobs, UI needs to do 1400000 locks on ResourceBundles.

 

More Analysis: "kill -3 <JT_PID>" / "JT Stack Script" gives the stack trace of JobTracker running process. JobTracker has IPC server threads and http threads. IPC server threads are responsible for getting jobs from JobClient, responding TaskTracker heartbeat, sending job status / job counters to JobClient.  Http threads are responsible for responding JT UI access with JT page. By default, 30 IPC server threads are running and 20 http threads are running. At customer site, the collected stack trace shows most of the 50 threads are WAITING on ResourceBundles lock, and one thread which has acquired the lock does processing. The chance of IPC server threads  getting a lock on ResourceBundles and serving JobClient is very less and the Root Cause is why job is not accepted immediately (job -list taking time).

 

Solution:To increase Scalability by reducing the need for lock on ResourceBundles, we introduced a parameter to hide CPU and Memory on JT UI which eliminates the need for 1400000 locks on ResourceBundles for single JT UI refresh. The below parameter will needed to be updated in the mapred-site.xml on the JobTracker node.

 

-----

Option: mapred.jobtracker.ui.showcountersDescription: Enables the CPU/memory counters for active jobs on the JobTracker node. Set the value to false to disable the CPU/memory counters. When disabled, the CPU/memory counters do not display in the JobTracker view of the MCS. Default value: true

-----

For other customers who want these parameters on JT UI, we did optimization in the lock implementation where instead of getting all 14 task counters to know the cpu and memory counter value, it will read only those two counters, thus it needs 10 * 1000 * 2 = 20000 locks on ResourceBundles to show CPU and Memory on JT UI. By removing the need for ResourceBundles lock from http threads, now the 30 IPC server threads can easily acquire the lock and serve the requests from JobClient immediately.

1 person found this helpful

Attachments

    Outcomes