Why sorting is needed in Map reduce ?
Before starting of reducer execution, the intermediate output obtained is sorted according to Keys and not by value. Value can be in any order. Sorting helps reducer to know when a new Reducer task should start. Reducer starts a new reduce task when the upcoming key in the sorted input data is different than the previous key. This helps in reducing time and improving performance. Sorting won’t happen if zero reducer is specified. Sorting is done both in Mapper node as well as Reducer node. Mapper make use of Quick Sort algorithm as well as Reducer make use of Merger sort algorithm. Mapper receives input key value pair from Record reader and generate intermediate output as per custom business logic. Before writing the intermediate output, it is partitioned and sorted by key. The intermediate output from reducer is shuffled and undergo processing in reducer as per custom business logic. Before processing, data is sorted according to key in reducer which uses merge sort algorithm.
So I have to ask... what's the purpose of these Q& A? You have someone asking a very basic MR1 question and then a text book answer that is cut -n- pasted?
(Oh and there are some things that's not 100% accurate. Free clue... in map/reduce there is only one worker node. The mapper runs as a task and then the reducer on the same set of nodes. )
Retrieving data ...