AnsweredAssumed Answered

DML operation on MapR DB JSON table using maprdb python package

Question asked by temp_expt on Jul 6, 2018

I've a very large JSON table in MapRDB. How can I update column or delete few rows based on few conditions ?

I tried below by applying conditions. But it's throwing an error. Any insight please to do so ?

from maprdb import connect, Document, Mutation, Condition

connection = connect()
table = connection.get("/path/to/large/json/table")

c = Condition({ '_id' : 'a123' })
# c = Condition({"_id": {"$eq": "a123"}})
# c = Condition()._and()._is("_id", "$eq", "a123")

df = table.find_by_condition(c)


The error I'm getting is below:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/site-packages/maprdb-0.0.3-py3.4.egg/maprdb/", line 198, in __getattr__
raise e
File "/usr/lib/python3.4/site-packages/maprdb-0.0.3-py3.4.egg/maprdb/", line 193, in __getattr__
return cls._get_Op().valueOf(item)
File "/usr/lib/python3.4/site-packages/maprdb-0.0.3-py3.4.egg/maprdb/", line 181, in _get_Op
cls._Op = jpype.JClass("com.mapr.db.Condition$Op")
File "/usr/lib/python3.4/site-packages/JPype1-0.6.1-py3.4-linux-x86_64.egg/jpype/", line 55, in JClass
raise _RUNTIMEEXCEPTION.PYEXC("Class %s not found" % name)
jpype._jexception.RuntimeExceptionPyRaisable: java.lang.RuntimeException: Class com.mapr.db.Condition$Op not found


My plan was, get the filtered data in dataFrame and update the dataframe. Then push the dataframe to base JSON table. Is it the right way to do so for a very large table ?



Or, if I use PySpark, then I can get filtered data using below:

df = spark.loadFromMapRDB(tableName).where(column("code") == 100)

Then update the "df" and push back to table using "saveToMapRDB". And how can I delete rows from JSON table using Spark based on column conditions ?

Is this the right way to do so ?