AnsweredAssumed Answered

Hive Out of Memory

Question asked by kkumar27 on Oct 9, 2015
Latest reply on Nov 2, 2016 by dodoman
Hi,
I am running a Hive insert on top of parquet files(created using Spark).
Hive insert is using partitioned by clause.
But at the end when the screen is printing messages like "Loading partition {=xyz, =123, =abc} a Java Heap Space exception is coming.

     java.lang.OutOfMemoryError: Java heap space
            at java.util.HashMap.createEntry(HashMap.java:901)
            at java.util.HashMap.addEntry(HashMap.java:888)
            at java.util.HashMap.put(HashMap.java:509)
            at org.apache.hadoop.hive.metastore.api.Partition.<init>(Partition.java:229)
            at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy(HiveMetaStoreClient.java:1356)
            at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionWithAuthInfo(HiveMetaStoreClient.java:1003)
            at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:606)
            at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
            at com.sun.proxy.$Proxy9.getPartitionWithAuthInfo(Unknown Source)
            at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1611)
            at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1565)
            at org.apache.hadoop.hive.ql.exec.StatsTask.getPartitionsList(StatsTask.java:403)
            at org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:150)
            at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:117)
            at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
            at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
            at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508)
            at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275)
            at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093)
            at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
            at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
            at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
            at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
            at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
            at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
            at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:456)
            at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:466)
            at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
            at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
            at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)

I have set following property while running the job, and have tried to change the values to higher and lower, but every time in the end I find this error.

Properties toggled:

    set mapred.map.tasks=100;
    set mapred.reduce.tasks=100;
    set mapreduce.map.java.opts=-Xmx4096m;
    set mapreduce.reduce.java.opts=-Xmx4096m;
    set hive.exec.max.dynamic.partitions.pernode=100000;
    set hive.exec.max.dynamic.partitions=100000;

Please suggest what is going wrong here.
Hive version is 0.13.


Outcomes