I am new to MapRDB JSON.I have written a scala code to pass a list of ids to look up in DB json table using loadFromMapRDB OJAI API but the lookup is really slow. For 300 ids it took 4min and 629 ids took 9min.For 500k ids the spark progress didn't show up. This type of performance is not acceptable for us as we onboard millions of data everyday. Please advise me on the bottleneck I have for below code at jsonlist.
val ingestedids = rowkey.select("key").rdd.map(r => r(0)).collect.toList
val maprdbjson = sc.loadFromMapRDB("/datalake/uhclake/tst/t_hdfs/uhc/Enriched/standard_access/pjs/cdb/data/Individual2_snapshot").where(field("_id") in ingestedids)
val jsonlist = maprdbjson.collect.toList
ingestedids are the list of ids that I wanted to lookup against DB JSON table by only _id lookup. jsonlist4 is the slowness I am facing now. DB JSON table volume is around 20Million and I am trying to retrieve whole documents for ids passed as a list.
Community ManagerAditya Kishore