AnsweredAssumed Answered

How to limit the mappers in hive job

Question asked by Karthee on May 23, 2017
Latest reply on Jul 7, 2017 by MichaelSegel

Hi All,

In my three node cluster, i have optimized all the required parameters for the performance. But this is not much helping in my case,

All our hive tables are created with parquet format, when my team tries to load from external table to internal table,

please find the script below,

 

ksh -c 'hadoop fs -rm -R hdfs:///user/hive/warehouse/bistore_sit_cycle2.db/wt_consumer/d_partition_number=0;

        hive -e  "set hive.exec.dynamic.partition.mode=nonstrict;

        insert into bistore_sit_cycle2.wt_consumer

        partition(d_partition_number)

        select * from bistore_sit_cycle2.ext_wt_consumer;

        set hive.exec.dynamic.partition.mode=strict;"'

 

it takes more than 2 hours to load, the hive job created with 718 mappers and running with 2 containers on each node, concurrently  5 mappers only running for this job. 

the load was 85M records and 35GB approximately. 

 

How to run like this jobs with less mappers and how to increase the concurrency of running mappers !!!???

 

Outcomes