As a feature request, it would be really nice to have MapR be "Hive" aware. I.e. we could pass a "warehouse" location to the mapr system.
What is the point of this? It would be nice be able to manage Hive directory locations with volumes.
What I mean by this, is having a volume be a database, or even a table (perhaps partition?). We can kinda do that now, however it's very manual and clumsy. For example, we can get metadata errors if we try to rename a table that's on one volume to a different table name.
I'd like to have the request to rename from one volume to another actually move the data, but instead it fails (I am guessing because hive issues a HDFS move command but you can't move between volumes in MapR) causing a failure.
I mean, thinking this through, couldn't we have a way to take HDFS move commands issued and instead of failing just execute them with a copy and delete, and in the case the delete was a volume, copy, delete, and then return success (even though the directory wasn't deleted?) I am thinking we could specify a directory like /user/hive/warehouse, and then set flags (two of them)
- mv is copy and delete: This would take a move command and copy the data, and then issue a delete of the source.
- Deletes of volumes return success. (Even though they aren't actually deleted). This way it wouldn't be a hive specific thing, but a control of behavior of directories.
This could allow us to mount volumes under the warehouse directory, have different replications for different tables etc. ?