AnsweredAssumed Answered

Make oozie skip hadoop native libs

Question asked by solste7en on Nov 19, 2013
Latest reply on Nov 20, 2013 by solste7en
We are running a MapR cluster with build version 3.0.1.21771.GA which comes with hadoop 0.20.2, hive 0.11 and oozie 3.3.2. Recently we tried to use oozie to schedule hive actions but running into problem of library versions conflict.

On Oozie mapper launching time, we noticed that other than the set shareLib and application LibPath we specified in our workflow properties file, Oozie will also (in default) load everything under HADOOP_HOME/lib. And it loads these libs prior to anything else:

    Oozie Launcher starts

    Heart beat
    Starting the execution of prepare actions
    Completed the execution of prepare actions successfully
    
    Files in current dir:/tmp/mapr-hadoop/mapred/local/taskTracker/xxx/jobcache/job_201310302353_65837/attempt_201310302353_65837_m_000000_0/work/.
    ======================
    File: .action.xml.crc
    Dir: tmp
    File: dim_fact.hql
    File: oozie-setup.hql
    File: action.xml
    
    Oozie Java/Map-Reduce/Pig action launcher-job configuration
    =================================================================
    Workflow job id   : 0000003-131119010154793-oozie-mapr-W
    Workflow action id: 0000003-131119010154793-oozie-mapr-W@track-dimension
    
    Classpath         :
    ------------------------
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../conf
      /usr/lib/jvm/java-7-oracle/lib/tools.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/..
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../hadoop*core*.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/amazon-s3.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/asm-3.2.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/aspectjrt-1.6.5.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/aspectjtools-1.6.5.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/aws-java-sdk-1.3.26.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/baseutils-0.1.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-cli-1.2.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-codec-1.5.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-collections-3.2.1.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-configuration-1.8.jar
       ......

after getting everything under hadoop lib it will put jars in our Libpath defined in job.properties to distributed cache, then at last the share/lib for hive (if it's set)

      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/slf4j-log4j12-1.4.3.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/xmlenc-0.52.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/zookeeper-3.3.6.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp-2.1.jar
      /opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp-api-2.1.jar
      /tmp/mapr-hadoop/mapred/local/taskTracker/distcache/942723254429794544_-1225012475_1909335115/maprfs/user/daisy/oozie-mapr/0000003-131119010154793-oozie-mapr-W/track-dimension--hive/hive-launcher.jar
      /tmp/mapr-hadoop/mapred/local/taskTracker/distcache/5178212031763849306_-1948248456_1492703266/maprfs/user/mapr/hive_data_store/lib/hive-jdbc-0.11-mapr.jar
      /tmp/mapr-hadoop/mapred/local/taskTracker/distcache/7275422565699763807_-1965089507_1492703254/maprfs/user/mapr/hive_data_store/lib/hive-hbase-handler-0.11-mapr.jar
   ..........

As we know, Hadoop-0.20.2 comes with pretty old libraries, For instance jackson-core-asl and jackson-mapper-asl are both version 1.5.2 under HADOOP_HOME/lib, while oozie-3.3.2 and Hive-0.11 comes with version 1.8.8. Pulling both jars actually creates conflicts and breaks our GenericUDAFEvaluator for some of the UDAF we are using. Also some other conflicted libs also have problem.

So we want to get rid of the old jars coming from hadoop in oozie execution, first thing we tried was to set one of the oozie environmental flag: hadoop.native.lib to false. Tried it on multiple levels: oozie-site.xml; job.properties and the the node level but oozie is still pulling those jars prior to this configuration.

Spent some time looking at the oozie source code, and seems this loading is in default on the mapper launching time, while the hadoop.native.lib property is only taken in at action node step:
https://github.com/yahoo/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/LauncherMapper.java#L358

We were acknowledged that hadoop version cannot be upgraded separately from the mapr package and will probably running hadoop 0.20.2 for a while. Therefore wondering if anyone else has run into similar issue and if there is any other workaround for skipping hadoop lib jars in oozie.

btw the same hive job has been tested in Hive CLI without any issue. Also if i understand it correctly, feel like the concept of ShareLib is losing it's point if it will be conflicted with the Hadoop native lib (assuming it exists there as well...)

Thanks!

Outcomes