I am getting an error while reading nested records (complex data type with explode) which has column with TIMESTAMP data type in Hive managed table and stored as Parquet.
create table all_datatype_prq
col_address array<struct<city:string,pin:bigint,dob:date,login:timestamp>> )
row format delimited
fields terminated by ","
collection items terminated by "|"
map keys terminated by "~"
stored as parquet;
Hive query & error:
hive (test_db)> select
> from all_datatype_prq
> LATERAL VIEW explode(col_address) expl as address;
Failed with exception java.io.IOException:parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file maprfs:///PATH/all_datatype_prq/000000_0
Hive version: 1.2
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
Parquet JAR: parquet-hadoop-bundle-1.6.0.jar
1. If the same Hive table is stored as TXT or ORC, there is no issue while reading Address array. So issue is specific when we store data in Parquet.
2. If we remove 'login' attribute with RIMESTAMP from Address Array there is no issue for reading data.
3. If we change datatype from TIMESTAMP to STRING for 'login' column then no issues while reading Address array even if it is stored as Parquet.
4. This issue occurs with MAP, ARRAY & STRUCT if they have TIMESTAMP column in it.
5. If we select any column outside of Address array, there is no issue for reading.
Any help on this topic will be appreciated as I would like to understand how to read TIMESTAMP column in an Array from Hive managed table stored as Parquet.