I've been trying to create a parquet table from a pipe delimited, gzipped file using the Apache Drill CTAS method and noticed data drop when the file completed. I've escaped the extraction of data from Redshift, tried processing the non-compressed version of the file, and it still drops rows.
I located a missing row from the file, moved it to the beginning of the file, and reprocessed. That row then made it into the parquet file.
I'm not sure what is causing the drop and was wondering if anyone else has come across this problem.