AnsweredAssumed Answered

hadoop process HTML files using map reduce

Question asked by naveen on Oct 10, 2013
Latest reply on Oct 11, 2013 by yufeldman
I want to process HTML files using map reduce in hadoop, so input would be a HTML document, file structure in HDFS :

   /data/htmls/1/(Html files)

   /data/htmls/2/(Html files)

   .

   .

   /data/htmls/n/(Html files)

I have a java function which takes a HTML file as input to do some processing, which I can call from mapper class. So how can I read HTML's into map function.

Outcomes