Avoiding skew and determinin�??g optimal number of mappers in SQOOP import.

Question asked by ngvinay on Jun 20, 2015

If there is a primary key on the source table, SQOOP import would generate no skewed data... What if there is no primary key defined on the table and we have to use --split-by parameter to split records among multiple mappers.

There are high chances of skewed data depending on the column we select to --split-by.

Could you please help me understand how to avoid skewing in such scenarios and also how to determine the optimal number of mappers to be used for any SQOOP import.