Spark job not accepting any resources

Document created by wade on Feb 27, 2016
Version 1Show Document
  • View in full screen mode

Author: Jitendra Yadav, last modified by Hao Zhu on May 8, 2015

 

Original Publication Date: May 1, 2015

 

Environment
Spark 1.2.1 (Standalone).

Symptom

WARN TaskSchedulerImpl: Initial job has not accepted any resources; 
check your cluster uito ensure that workers are registered and have sufficient memory

Root Cause
This is the most common messages which any user can get while submitting job to spark master, this message means that
an application is requesting more resources from the cluster than the cluster can currently provide i.e No. of Cores and physical memory.Note for both of these resources the maximum value is not your System’s max, it is the max as set by the your Spark configuration. To see the current state of your cluster (and it’s free resources) check out the UI at SparkMasterIP:8080Check the running application section and find out which application consuming full resources.

Solution

The short term solution to this problem is to make sure you aren’t requesting more resources from your cluster than exist or to shut down any apps that are unnecessarily using resources. If you need to run multiple Spark apps simultaneously then you’ll need to adjust the amount of cores being used by each app.Example:while running spark application use below options to set resources.

 

--executor-memory 20G
--total-executor-cores 100

 

Example:

./bin/spark-submit \ 
--class org.apache.spark.examples.SparkPI \
--master yarn-cluster \ --num-executors 3 \
--driver-memory 4g \ --executor-memory 2g \
--executor-cores 10 \ lib/spark-examples*.jar \ 10

Or we can also increase the capacity of worker node by increasing max cores and memory on each node,

 

-C CORES, --cores CORES : Total CPU cores to allow Spark applications to use on the machine (default: all available); only on worker.

 

-M MEM, --memory MEM : Total amount of memory to allow Spark applications to use on the machine, in a format like 1000M or 2G (default: your machine's total RAM minus 1 GB); only on worker.

Attachments

    Outcomes