In my three node cluster, i have optimized all the required parameters for the performance. But this is not much helping in my case,
All our hive tables are created with parquet format, when my team tries to load from external table to internal table,
please find the script below,
ksh -c 'hadoop fs -rm -R hdfs:///user/hive/warehouse/bistore_sit_cycle2.db/wt_consumer/d_partition_number=0;
hive -e "set hive.exec.dynamic.partition.mode=nonstrict;
insert into bistore_sit_cycle2.wt_consumer
select * from bistore_sit_cycle2.ext_wt_consumer;
it takes more than 2 hours to load, the hive job created with 718 mappers and running with 2 containers on each node, concurrently 5 mappers only running for this job.
the load was 85M records and 35GB approximately.
How to run like this jobs with less mappers and how to increase the concurrency of running mappers !!!???