Can you explain the importance of Record Reader in Hadoop?
It is a class which actually loads the data from source. Mapper always understands key value pair, so record reads line by line from input split and converts it into key value pair for the mapper. It is invoked repeatedly on the input until the entire split is consumed. On each invocation of record reader leads to another call of map function defined by the programmer. The record reader instance is defined by the input format.
Pretty good, but you need to point out that its not necessarily a single line. You can have multiple lines in a single record. A good example of this is if you're using XML or JSON to represent the record. God I'm going from memory, but you'd have to overload the input record format and this could also include extending the multi-line input format so that you can capture a single record for processing.
Its also important to note that your single Mapper task will read at most 2 splits. The record reader will skip bytes until it finds the start of a record, and if the record exceeds the end of the current split, it will read into the next split to finish reading the record.
Retrieving data ...