AnsweredAssumed Answered

Spark SQL on MapR-DB

Question asked by danielsobrado on Jan 18, 2017
Latest reply on Jan 19, 2017 by MichaelSegel

Hi,

 

I'm wondering which is the best approach to use Spark SQL on MapR-DB without Hive, I'm using MapR 5.1, Spark 1.6.1.

 

What if done is to import in my project:

 

  • hbase-spark-2.0.0
  • shc-core-1.1.0-1.6

 

With this I'm creating a mapping like:

 

def cat = s"""{
|"table":{"namespace":"default", "name":"/scheme/Table"},
|"rowkey":"currency:asat",
|"columns":{
|"col0":{"cf":"rowkey", "col":"currency", "type":"string", "length":"3"},
|"col1":{"cf":"rowkey", "col":"asat", "type":"string"},
|"currency":{"cf":"sr", "col":"currency", "type":"string"},
|"asat":{"cf":"sr", "col":"asat", "type":"string"},
|"value1":{"cf":"sr", "col":"value1", "type":"float"},
|"value2":{"cf":"sr", "col":"value2", "type":"float"}
|}
|}""".stripMargin

And loading using:

def withCatalog(cat: String): DataFrame = {
sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog->cat))
.format("org.apache.spark.sql.execution.datasources.hbase")
.option("zkUrl", "host:5181:/hbase-unsecure")
.load()
}

Any views? Any other approach?

I see with this the limitation that for composite keys only the last key can have a variable length.

 

Thanks,

 

Daniel

Outcomes