Mapper generates the key value pair, but how exactly is it sent to Partitioner and how exactly the partitioner decides which reducer to pick for the given key value pair.

Ex: I am MapR

Mapper output,

I 1

am 1

MapR 1

Now how does it decides on the hash value? Whether there is any range of hash values set for a partitioner or reducer to be sent to or is it completely random?

Please help me understand this.

Ex: I am MapR

Mapper output,

I 1

am 1

MapR 1

Now how does it decides on the hash value? Whether there is any range of hash values set for a partitioner or reducer to be sent to or is it completely random?

Please help me understand this.

There is one partition for each reducer. So, for example, if you request 3 reducers (-Dmapreduce.reduce.tasks=3) in your m/r job, then there are 3 partitions (0, 1, and 2). The default partitioner does a simple hash of the key, and then calculates the arithmetic modulus of the hash of the key.

Informally, it's calculated this way:

return hash(key) % numReduceTasks;

Some things to note:

The partition numbers must be positive (0, 1, 2, ...), but the hash of a key may be negative (depending on the key). For that reason, the partitioner needs to force the hashed key to be positive. Formally, it's calculated this way:

return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;

By default, the number of reducers associated with a m/r job is 1. This means that exactly 1 partition is created, and all key-value pairs will land in that 1 partition (partition 0).

If your key distribution is lexically uniform, then the modulus of the hash of the keys will also be uniform. That means that each reducer will get similar numbers of key-value pairs.

If you're not satisfied with the behavior of the default partitioner (or otherwise need some specific behavior not available in any of the off-the-shelf partitioners), you can always write your own.