AnsweredAssumed Answered

Sqoop Parallelism

Question asked by dzndrx on Sep 15, 2017
Latest reply on Sep 15, 2017 by cathy

Hi Community,

 

Can anyone me understand this stuff. I quite understand that when I run sqoop script the default mappers would be 4 and it runs parallel. So sqoop is creating 4 sets of sql script base on the primary key resulting for four chunks of data. The thing I don't understand is does this mappers run on every nodes? Because if each mappers run on each nodes then the download would be much faster given that every node bandwidth is usable. Or does all the 4 mappers stays on the same node so the parallelism effect would only be on the cpu (hyperthreading stuff) not in the bandwidth and thus it only improves the download (a little) speed If each mapper doesn't consume the maximum bandwidth. Im really confuse. 

 

Any inputs is appreciated

Outcomes