AnsweredAssumed Answered

Combining large number of small files for mapreduce input.

Question asked by tom_07 on Nov 29, 2013
Latest reply on Nov 29, 2013 by Ted Dunning
I am new to Hadoop & MapR .We are developing a network monitoring tool (in java).We collect various information of monitored devices periodically , say in every 5 sec. and write that information to HDFS through java client each information as new file(since we'r not using hdfs append facility).Thus each file typically less than 2KB in size.
I know each map task can take upto  1 file, and it will spawn as much as map task and the job will be inefficient. To get rid of this we used merging facility of FileUtil before submitting job:

<pre><code>FileUtil.copyMerge(fileSystem, new Path("sourceDir"), fileSystem,
    new Path("mapInputfile"), false, conf, null);</code></pre>
    
   Is it a good practice ? Or is there any other mechanism used for such requirements? Please help...

Outcomes