I am confused with the memory configurations in Spark-1.5.2 - Spark-on-Yarn mode.
My environment settings are as below:
3 Node MAPR Cluster - Each Node: Memory 256G, 16 CPU
Spark 1.5.2 - Spark-on-Yarn
Input data information:
480 GB Parquet format table from Hive, I'm using spark-sql for querying the hive context with spark-on-yarn,but it's lot slower than the Hive, and am not sure with the right memory configurations for Spark,
These are my config's,
and how to avoid GC Overhead exceptions as well as Java Heap space exceptions in spark-sql CLI???
so am using Apache Zeppelin with Spark interpreter, but querying in Spark takes a very longer time than the hive !!!
i am not sure with how to use "CACHE TABLE" in Zeppelin with Spark-Interpreter ???
This is the environment variable in spark web-ui,
this supposed to be yarn-cluster right??? if it's wrong, how to change the spark.master???
Your assistance would be really appreciated!