What is the key difference between MapReduce Version 1 and MapReduce Version 2?
MapReduce Version 1 uses services called JobTracker and TaskTracker to execute jobs, and also allocates resources as fixed "slots." A slot is basically a share of CPU and memory. There are several limitations with this implementation. First, the single JobTracker is coordinating all of the MapReduce jobs (allocating resources, scheduling, and monitoring jobs), so it's not very scalable: the JobTracker can get overwhelmed. Second, slots have to be pre-defined as Map slots (which can only be used by Map tasks) and Reduce slots (which can only be used by Reduce tasks). A Mapper can't use a Reducer slot (even if it's sitting idle), and vice versa. Finally, this model only supports MapReduce jobs.
YARN (Yet Another Resource Manager) was introduced with Hadoop 2, and MapReduce Version 2 uses YARN. JobTracker and TaskTracker are gone: instead, MRv2 uses YARN's ResourceManager, NodeManager, and ApplicationMaster. Resources are allocated dynamically as "resource containers," which more efficiently share a cluster's resources. And YARN can run other applications on the cluster, not just MapReduce jobs.
Retrieving data ...