Create an extension Filtered Views to MapRFS

Idea created by MichaelSegel on Oct 8, 2017
    • MichaelSegel

    Similar, but simpler idea would be to allow for the creation of a filtered view which allows access to the underlying file, yet the output of the data is already filtered based on a schema that could be stored in HiveServer2 or some other mechanism.


    Here's an example of a USE CASE to help described the need and the functionality.


    You have a hive table stored in Parquet where it contains some PII information.

    You want to expose some of the non PII information to be used by an application, but you don't want to grant permission to read the underlying table.


    The solution would be to create a view where the view contains the information you want to expose, yet hides the data that you don't want to expose.  Note that this is not a materialized view, so applications like hive and spark.sql jobs will need to be able to scan the underlying table.  Using Aces, because the team's job doesn't have permission to read the file, the view will fail.


    The proposed solution... extend the FS API to allow for registering the view and to filter the data on read so that the view has access to the underlying table, yet the output from the view does not and the view has a separate set of permissions from the underlying table which can be set by ACES.

    This would be a specialized subset of named pipes. And it could first be tied to HiveServer2 for the schema management, and support a few known file types (parquet, csv, orc, etc... ) to start with.