Spark Troubleshooting guide: Spark SQL: How do I print the Schema of a Dataframe?

Document created by hdevanath Employee on Jun 19, 2017
Version 1Show Document
  • View in full screen mode

One quick thing to start the debugging would be to check the schema of the dataframe.

 

Schema of a Hive table can be fetched from Spark Shell as below:

 

emp.printSchema()

root

 |-- empId: integer (nullable = false)

 |-- ssn: integer (nullable = false)

 |-- deptId: integer (nullable = false)

 |-- salary: integer (nullable = false)

 |-- age: integer (nullable = false)

 

Schema of a Hive table can be fetched from Spark Shell as below:

 

 

 

As you can see it prints only 20 records and will be good only for smaller tables. For bigger tables, its recommended to use the below command to fetch the complete schema:  

 

Attachments

    Outcomes