AnsweredAssumed Answered

How to Deal with memory configurations in Spark-1.5.2

Question asked by Karthee on Jul 14, 2017
Latest reply on Jul 26, 2017 by maprcommunity

Hi There,

 

I am confused with the memory configurations in Spark-1.5.2 - Spark-on-Yarn mode.

 

My environment settings are as below:

 

3 Node MAPR Cluster - Each Node: Memory 256G, 16 CPU
Hadoop 2.7.0
Spark 1.5.2 - Spark-on-Yarn

 

Input data information:

480 GB Parquet format table from Hive, I'm using spark-sql for querying the hive context with spark-on-yarn,but it's lot slower than the Hive, and am not sure with the right memory configurations for Spark,

These are my config's,

--> spark-defaults.conf

spark.executor.memory                              64g
spark.logConf                                             true
spark.eventLog.dir                                      maprfs:///apps/spark
spark.eventLog.enabled                             true
spark.serializer                                           org.apache.spark.serializer.KryoSerializer
spark.driver.memory                                  16g
spark.executor.instances                           70
spark.kryoserializer.buffer.max                  1024m
spark.yarn.executor.memoryOverhead      6144m

spark.sql.inMemoryColumnarStorage.compressed    true
spark.sql.inMemoryColumnarStorage.batchSize       100000

-->spark-env.sh

export SPARK_DAEMON_MEMORY=1g
export SPARK_WORKER_MEMORY=2g

and how to avoid GC Overhead exceptions as well as Java Heap space exceptions in spark-sql CLI???

so am using Apache Zeppelin with Spark interpreter, but querying in Spark takes a very longer time than the hive !!!

i am not sure with how to use "CACHE TABLE" in Zeppelin with Spark-Interpreter ???

This is the environment variable in spark web-ui,

spark.master           local[*],

this supposed to be yarn-cluster right??? if it's wrong, how to change the spark.master???

 

Your assistance would be really appreciated! 

Outcomes