AnsweredAssumed Answered

How do you fix existing data files by modifying files?

Question asked by communityadmin on Jun 22, 2011
Latest reply on Jun 23, 2011 by sathya
We are using Hive to store a year of historical data.  Recently, we noticed errors
in our time partitions because of random delays in flume's data collection.

I can reprocess all of this data to fix these errors using a map-reduce program to rewrite all of my data into files with the correct name, but it would be nice if I could just modify the files to add or delete a few records where the native segmentation isn't correct.  My data volumes are small enough I could even do this with conventional programs, but I can't afford to move large files into and out of HDFS to make this work.

Outcomes