Spark Troubleshooting guide: Spark SQL: How do I generate Physical, Logical and Optimized Logical Plan?

Document created by hdevanath Employee on Jun 19, 2017
Version 1Show Document
  • View in full screen mode

This article explains how to generate the logical, physical and optimized logical plan of  a Spark SQL query.

 

 

You can generate the Physical, Logical and Optimized Logical plan for a Spark SQL query. Use explain()to understand how the SQL executes and for unfiltered information use explain (extended=true).

scala> spark.sql("select * from emp e left outer join dept d on e.deptId=d.dept_id").explain 
== Physical Plan ==
*BroadcastHashJoin [deptId#79], [dept_id#95], LeftOuter, BuildRight
:- LocalTableScan [empId#77, ssn#78, deptId#79, salary#80, age#81]
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))   
   +- LocalTableScan [dept_id#95, dept_name#96, dept_code#97]

To output the detailed plan, use the below command as an example:

 

scala> spark.sql("select * from emp e left outer join dept d on e.deptId=d.dept_id").explain(true)
== Parsed Logical Plan ==
'Project [*]
+- 'Join LeftOuter, ('e.deptId = 'd.dept_id)
:- 'UnresolvedRelation `emp`, e
+- 'UnresolvedRelation `dept`, d
== Analyzed Logical Plan ==
empId: int, ssn: int, deptId: int, salary: int, age: int, dept_id: int, dept_name: string, dept_code: string
Project [empId#77, ssn#78, deptId#79, salary#80, age#81, dept_id#95, dept_name#96, dept_code#97]
+- Join LeftOuter, (deptId#79 = dept_id#95)
:- SubqueryAlias e
: +- SubqueryAlias emp
: +- Project [_1#71 AS empId#77, _2#72 AS ssn#78, _3#73 AS deptId#79, _4#74 AS salary#80, _5#75 AS age#81]
: +- LocalRelation [_1#71, _2#72, _3#73, _4#74, _5#75]
+- SubqueryAlias d
+- SubqueryAlias dept
+- Project [_1#91 AS dept_id#95, _2#92 AS dept_name#96, _3#93 AS dept_code#97]
+- LocalRelation [_1#91, _2#92, _3#93]
== Optimized Logical Plan ==
Join LeftOuter, (deptId#79 = dept_id#95)
:- LocalRelation [empId#77, ssn#78, deptId#79, salary#80, age#81]
+- LocalRelation [dept_id#95, dept_name#96, dept_code#97]
== Physical Plan ==
*BroadcastHashJoin [deptId#79], [dept_id#95], LeftOuter, BuildRight
:- LocalTableScan [empId#77, ssn#78, deptId#79, salary#80, age#81]
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
+- LocalTableScan [dept_id#95, dept_name#96, dept_code#97]
scala>

Attachments

    Outcomes