I am validating one of the use cases we have over Parquet with Spark SQL
I generated a bunch of parquet files using Spark Streaming by synchronizing messages from Kafka. My parquet files are partitioned by date and stored in gz compressed format.
I am trying to create an EXTERNAL hive table,
CREATE EXTERNAL TABLE logs (id string, ...... code1 string, code2 string, code3 string, date1 string, date2 string) PARTITIONED BY (creationdate int) STORED AS PARQUET LOCATION '/apps/spark/logs';
when I try to query my hive external table it is always showing zero results.
I let hive know about the partition and hive is still not able to see it.
ALTER TABLE auditLogs ADD PARTITION(creationDate=20171005)
I am assuming this should be because of the parquet SerDe it might be using in Spark and hive not able to recognise it.
Can you recommend.
I have hive 1.2.1, spark 1.6.1 and I am on MEP 1.1 using MapR 5.2
Any help is appreciated. Thank you