It fast because data will not be moved rather computational algorithms will be moved to the place where data is residing.
What about U?
Moving algorithms rather than data is definitely super helpful. Well-written hadoop (map-reduce) jobs work hard to minimize the data movement.
Map reduce is far from the fastest technology though; it reads and writes all results to disks. Things like Spark and its competitors have surpassed it as they work hard to lazily compute and optimize all end-results and they store data in memory where possible to avoid intermediate writes to disk.
Spark has even moved on to implement SQL which gets optimized in a functionally similar manner to an RDBMS query.
bigdata, hadoop, hadoop question, hadoop training
In Hadoop architecture, we have namenode and datanode. When data is stored in HDFS, namenode will transfer data in chunks to any free datanode and makes a note of it. When we need to process, namenode will assign the job to the machines where data is saved.
Hadoop was the best solution for storing and processing big data because:1. It stores huge files as they are (raw) without specifying any schema.2. High scalability - any number of nodes can be added at once hence enhancing performance dramatically.3. It's economic so it suits the purse of anyone starting from a startup to a tech giant. Commodity hardware can be efficiently used with Hadoop.4. Reliable - As there is no danger of losing data even if nodes in the clusters fail, it's highly reliable. Recovery and backup of data are automatic.5. Open source - No headache of licensing. Download and enjoy the power of Hadoop.The above and many more interesting and useful characteristics of Hadoop combined make it so popular in the industry.
Hadoop is lightning fast because of data locality - move computation to data rather than moving the data, as it is easier and make processing lightning fast. The Same algorithm is available for all the nodes in the cluster to process on chunks of data stored in them. So data processing is not done on one big piece of data, but rather smaller distributed pieces thus enhancing performance by processing them distributedly.
For more details, please visit:
Hadoop Tutorial - A Guide on Big Data Hadoop for Beginners - DataFlair
Retrieving data ...