AnsweredAssumed Answered

"Transactional" conversion of CSV to Parquet?

Question asked by mattk on Oct 24, 2016
Latest reply on Oct 25, 2016 by aengelbrecht

I have a cluster that receives log files in a csv format on a per-minute basis, and those files are immediately available to Drill users. For performance I create Parquet files from them in batch using CTAS commands.

 

I would like to script a process that makes the Parquet files available on creation, perhaps through a UNION view, but that does not serve duplicate data through both an original csv and converted Parquet file at the same time.

 

Is there a common practice to making data available once converted, in something similar to a transactional batch of "convert then (re)move source csv files" ?

Outcomes