AnsweredAssumed Answered

How to overwrite parquet formatted hive table (MapR Sandbox + Spark 1.5.2 + Hive 1.2)

Question asked by kisa500 on Apr 27, 2016
Latest reply on Jun 3, 2016 by maprcommunity

Background:

I've created a hive external table with data stored in parquet format.

Using MapR sandbox ; Spark 1.5.2; Hive 1.2

 

I attempt to read the date (if any) into a data frame, perform some transformations, and then overwrite the original data with the new set.

 

I have tried:

1. df.write.insertInto('table_name', overwrite='true')

     This seems to work correctly with orc format, but with parquet throws an error that :

pyspark.sql.utils.AnalysisException: Cannot insert overwrite into table that is also being read from.

2. df.write.mode('overwrite').parquet('my_path') AND  df.write.parquet('my_path', mode='overwrite') AND df.write.save('my_path', format='parquet', mode = 'overwrite')

     This runs successfully the first time (when no data exists and therefore no overwrite in necessary). Once an overwrite is necessary this too fails with:

ERROR Client fs/client/fileclient/cc/client.cc:1802 Thread: 620 Open failed for file /my_path/path/part-r-00084-9, LookupFid error No such file or directory(2)

2016-04-26 16:47:17,0942 ERROR JniCommon fs/client/fileclient/cc/jni_MapRClient.cc:2488 Thread: 620 getBlockInfo failed, Could not open file /my_path//part-r-00084-9

16/04/26 16:47:17 WARN DAGScheduler: Creating new stage failed due to exception - job: 16

 

Any idea how to resolve these issues?

Thanks in advance!

Outcomes