AnsweredAssumed Answered

MapRDB JSON lookup by _id for an input list of ids slowness

Question asked by sampatisri on May 4, 2018
Latest reply on May 7, 2018 by cathy

Hi Team,

              I am new to MapRDB JSON.I have written a scala code to pass a list of ids to look up in DB json table using loadFromMapRDB OJAI API but the lookup is really slow. For 300 ids it took 4min and 629 ids took 9min.For 500k ids the spark progress didn't show up. This type of performance is not acceptable for us as we onboard millions of data everyday. Please advise me on the bottleneck I have for below code at jsonlist.

 

val ingestedids = rowkey.select("key").rdd.map(r => r(0)).collect.toList

val maprdbjson = sc.loadFromMapRDB("/datalake/uhclake/tst/t_hdfs/uhc/Enriched/standard_access/pjs/cdb/data/Individual2_snapshot").where(field("_id") in ingestedids)

val jsonlist = maprdbjson.collect.toList

 

ingestedids are the list of ids that I wanted to lookup against DB JSON table by only _id lookup. jsonlist4 is the slowness I am facing now. DB JSON table volume is around 20Million and I am trying to retrieve whole documents for ids passed as a list.

 

Community ManagerAditya Kishore

Outcomes