AnsweredAssumed Answered

MapR, Hive, Volumes, and Storage Locations

Question asked by mandoskippy on Nov 9, 2013
Latest reply on Apr 7, 2017 by mandoskippy

As a feature request, it would be really nice to have MapR be "Hive" aware. I.e. we could pass a "warehouse" location to the mapr system.

 

What is the point of this? It would be nice be able to manage Hive directory locations with volumes.

 

What I mean by this, is having a volume be a database, or even a table (perhaps partition?). We can kinda do that now, however it's very manual and clumsy. For example, we can get metadata errors if we try to rename a table that's on one volume to a different table name.

 

I'd like to have the request to rename from one volume to another actually move the data, but instead it fails (I am guessing because hive issues a HDFS move command but you can't move between volumes in MapR) causing a failure.

 

I mean, thinking this through, couldn't we have a way to take HDFS move commands issued and instead of failing just execute them with a copy and delete, and in the case the delete was a volume, copy, delete, and then return success (even though the directory wasn't deleted?)  I am thinking we could specify a directory like /user/hive/warehouse, and then set flags (two of them)

 

  1. mv is copy and delete: This would take a move command and copy the data, and then issue a delete of the source.
  2. Deletes of volumes return success. (Even though they aren't actually deleted).  This way it wouldn't be a hive specific thing, but a control of behavior of directories.

 

This could allow us to mount volumes under the warehouse directory, have different replications for different tables etc. ?

Outcomes