Spark Troubleshooting guide: Debugging Spark Applications: How to pass from executor and driver.

Document created by hdevanath Employee on Jun 19, 2017Last modified by hdevanath Employee on Jun 19, 2017
Version 3Show Document
  • View in full screen mode

Log4j is a tool in the JavaSW library that specializes in logging. It is a tool to help the programmer output log statements to a variety of output targets. It is helpful to enable logging so that the problem can be identified. With log4j it is possible to enable logging at runtime without modifying the application binary.


The log4j package is designed so that log statements can remain in shipped code without incurring a high performance cost. It follows that the speed of logging (or rather not logging) is capital. However, the log output can be so voluminous. log4j addresses this with hierarchical loggers. Using loggers it is possible to selectively control which log statements are output at arbitrary granularity. The log4j utility is designed with three goals in mind: reliability, speed and flexibility. There is a tight balance between these requirements. We believe that log4j strikes the right balance.

Scenario 1) Log4j’s RollingFileAppender
Spark uses log4j as logging facility. The default configuration is to write all logs into standard error, which is fine for batch jobs. But for streaming jobs, we’d better use rolling-file appender, to cut log files by size and keep only several recent files.

log4j.rootLogger=INFO, rolling 
log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n

This means log4j will roll the log file by 50MB and keep only 5 recent files. These files are saved in /var/log/spark directory, with filename picked from system property We also set the logging level of our package com.vmeg.code according to vm.logging.level property. Another thing to mention is that we set org.apache.spark to level WARN, so as to ignore verbose logs from spark.
Scenario 2) Standalone Mode
In standalone mode, Spark driver is running on the machine where you submit the job, and each Spark worker node will run an executor for this job. So, you need to setup log4j for both driver and executor.

spark-submit   --master spark://   
--driver-java-options "-Dlog4j.configuration=file:/path/to/ -Dvm.logging.level=DEBUG"  
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/path/to/ -Dvm.logging.level=DEBUG"

Scenario 3) Spark on YARN
As you can see, both driver and executor use the same configuration file. That is because in yarn-cluster mode, driver is also run as a container in YARN.

--master yarn-cluster  
--files /path/to/  
--conf ""  
--conf ""


# Set everything to be logged to the console 
log4j.rootCategory=WARN, console
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n   

# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.  

# Settings to quiet third party logs that are too verbose$exprTyper=INFO$SparkILoopInterpreter=INFO log4j.logger.parquet=ERROR   

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support  

#Any custom class debug  

#Netty classes,RollingAppender,RollingAppender,RollingAppender