Looking into mapr streams, streamsets seems to be used often for managing streaming tasks. One problem that I encountered was that the streamsets service seems to be controlled by a single sdc-user that is created during installation of the software (sdc I think means streamsets data collector). Thus in order to stream data from some origin to a destination, the sdc-user needs the appropriate access rights for those locations (read from origin, write to destination), which so far has amounted to me having to make those locations read/write accessible to public users.
My question is, given the need to stream data between volumes on the cluster, how should these permissions be given (keeping security in mind)? Eg. should I add the sdc-user to the mapr group so it can access any/all files that the mapr-user can? Should I set ACEs to allow the sdc-user to only access locations that are part of some stream as I am making them (this would have the downside that any person who makes a stream (developers, etc.) would need to then ask the mapr cluster admin to set the appropriate permissions on the stream endpoint locations, each time). Advice would be appreciated. Thanks.
** Note, I don't have the sdc-user replicated across all nodes (which I think would be need to set mapr ACEs on the user, else MCS says the entity does not exist) and am specifying mapr locations to the streamsets pipeline builder using the NFS locations of the mapr FS on the node that streamsets is installed.