AnsweredAssumed Answered

Errors with impersonation and Hive

Question asked by jirwin on Jan 23, 2014
Latest reply on Mar 26, 2014 by jcirrguy
I am getting file permissions errors when I attempt to execute a Hive query from a custom client app, and the app is impersonating another user.

The specific error is on the task node, which is unable to read the job.jar file from the staging directory. This is the exception that appears in the task log:  java.io.FileNotFoundException: Requested file /var/mapr/cluster/mapred/jobTracker/staging/testuser/.staging/job_201401231403_0001/job.jar does not exist.

Here is a summary of how I have configured the test environment.  I'm running a modified version of the MapR 3 demo VM.

 - User "mapr" runs the Hive metastore, which is using MySQL.  The Hive CLI can execute queries, and my app can execute queries, as long as my app is not attempting user impersonation.  My app's queries fail on the task node when impersonation is used.
 - User "jirwin" runs my app.
 - User "testuser" is the user being impersonated.  It exists as a user on the Linux system.
 - core-site.xml is configured to allow impersonation by "jirwin"
 - `<property><name>hadoop.proxyuser.jirwin.hosts</name><value>*</value></property>`
 - `<property><name>hadoop.proxyuser.jirwin.groups</name><value>*</value></property>`
 - `/opt/mapr/conf/proxy/jirwin` exists and is readable
 - `MAPR_IMPERSONATION_ENABLED=1` is exported
 - The `mapreduce.jobtracker.staging.root.dir` value is `/var/mapr/cluster/mapred/jobTracker/staging`
 - The `dfs.umaskmode` value is `022`

As best as I can tell, here is what is happening:

My app uses the `org.apache.hadoop.hive.ql.Driver` class to execute the query, within a `org.apache.hadoop.security.UserGroupInformation.doAs` call.

The Hive client submits the job by staging it in `/var/mapr/cluster/mapred/jobTracker/staging/testuser`.  That staging directory is being created with the following ownership and permissions:

 `drwx------   - jirwin jirwin /var/mapr/cluster/mapred/jobTracker/staging/testuser`

By contrast, the parent `/var/mapr/cluster/mapred/jobTracker/staging` has these permissions:

 `drwxrwxrwx   - mapr mapr     /var/mapr/cluster/mapred/jobTracker/staging`

Because the staging/testuser directory is only accessible to "jirwin", when the task is executed on the nodes as "testuser" (because impersonation is working to that extent), it is unable to read the job configuration and jar file.

What configuration changes have I overlooked in order to allow tasks to run with impersontation?

Outcomes