AnsweredAssumed Answered

MapR + Streamsets: audit logs show inconsistent/mixed up UID+GID values (seen for streamsets operations with user impersonation)

Question asked by reedv on Jan 25, 2018
Latest reply on Apr 20, 2018 by reedv

Running a streamsets pipeline that is set to impersonate some mapr user, "myuser", and uses batch cluster mode to move data to some mapr volume with auditing enabled. Pipeline is validated and run successfully, but when checking the output of mapr expandaudit on the volumes that were just operated on, using drill explorer, I see:

Notice that the user="mapr", not "myuser". Yet, at the same time, the uid=5001 (which is the uid of "myuser"). Furthermore, running a direct query like:

SELECT * FROM `dfs`.`root`.`./expandaudit_dir/some_audited_vol/38230597` where uid='5001'

there is not a single user="myuser", only the mapr user. This all seems very weird to me. Does anyone have any explanation as to why this could be happening and how to fix it?

 

Note: this happens whether I run the streamsets pipeline either directly from the streamsets web UI or running a script (as user mapr) that uses the streamsets cli tool to activate the pipeline.

Outcomes