prakhar

Create Hive Table Over Nested Json

Blog Post created by prakhar Employee on May 4, 2017

There are situations where you might struggle to create hive table over nested json. Nested Json are complicated and to query over them using hive is challenging.

 

You can refer below steps which will help you in understanding how to create hive table over nested json also querying 

 

Always make sure of below points:

 

1.The definition needs to match exactly with the JSON element names.

2.When using json Serde, you need to create each Json element as a column.

3.If there are nested jsons, it becomes more complicated. Nested Jsons can be created with struct datatypes.

 

Follow below steps :

1. Create a Sample Record present in the file in maprfs /path/to/any/maprfs/dir/sample2.json

{"DocId":"ABC","User1":{"Id":1234,"Username":"sam1234","Name":"Sam","ShippingAddress":{"Address1":"123 Main St.","Address2":null,"City":"Durham","State":"NC"},"Orders":[{"ItemId":6789,"OrderDate":"11/11/2012"},{"ItemId":4352,"OrderDate":"12/12/2012"}]}}

 

Or you must be having your own Json. Identify all the elements and create column mappings accordingly

 

2.
Add the jar file in hive cli. You may have different version, I have 1611 jar file

add jar /opt/mapr/hive/hive-1.2/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.0-mapr-1611.jar;

 

This Jar contains all the implementations of handling complex json files.

 

3.
Create the Table Definition as below.

CREATE EXTERNAL TABLE sample_json3(
    DocId string,
    user1 struct<Id: int, username: string, name:string,shippingaddress:struct<address1:string,address2:string,city:string,state:string>, orders:array<struct<ItemId:int,orderdate:string>>>
)
ROW FORMAT SERDE
 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION
  'maprfs:/tmp/jsondir'

 

I am done with element to column mapping here

 

4. Now you can query your json data as below.

Output of the query

hive> select DocId,user1.username,user1.name,user1.orders from sample_json3;
OK
ABC     sam1234 Sam     [{"itemid":6789,"orderdate":"11/11/2012"},{"itemid":4352,"orderdate":"12/12/2012"}]
Time taken: 0.098 seconds, Fetched: 1 row(s)
hive>

Outcomes