AnsweredAssumed Answered

MapR, Hive, Volumes, and Storage Locations

Question asked by mandoskippy on Nov 9, 2013
As a feature request, it would be really nice to have MapR be "Hive" aware. I.e. we could pass a "warehouse" location to the mapr system.  What is the point of this?  It would be nice be able to manage Hive directory locations with volumes. What I mean by this, is having a volume be a database, or even a table (perhaps partition?) We can kinda do that now, however it's very manual and clumsy. For example, we can get metadata errors if we try to rename a table that's on one volume to a different table name. I'd like to have the request to rename from one volume to another actually move the data, but instead it fails (I am guessing because hive issues a HDFS move command but you can't move between volumes in mapr) causing a failure.  I mean, thinking this through, couldn't we have a way to take HDFS move commands issued and instead of failing just execute them with a copy and delete, and in the case the delete was a volume, copy, delete, and then return success (even though the directory wasn't deleted?)  I am thinking we could specify a directory like /user/hive/warehouse, and then set flags (two of them) 1. mv is copy and delete: This would take a move command and move the data, and then issue a delete of the source. 2. Deletes of volumes return success. (Even though they aren't actually deleted).  This way it wouldn't be a hive specific thing, but a control of behavior of directories. This could allow us to mount volumes under the warehouse directory, have different replications for different tables etc. ?

Outcomes