AnsweredAssumed Answered

Issue Querying Decimal Fields in Apache Drill + Parquet Files + MapR Cluster

Question asked by PETER.EDIKE on Apr 27, 2018
Latest reply on May 1, 2018 by aengelbrecht

Hello Everyone,

 

I am trying to run the following query in Apache Drill.

 

select pan, count(*) as number_of_transactions , terminal_id,SUM((cast(SETTLE_AMOUNT_IMPACT as double) *-1)/100) AS settle_amount_impact   from dfs.`/iswdata/storage/products/superswitch/parquet/transactions`  where pan like '506126%' and terminal_id like '1%' and sink_node_name like ('SWTDB%')  and source_node_name not like ('SWTDBLsrc') and tran_completed=1 and tran_reversed = 0 and tran_postilion_originated = 1 AND tran_type = '01' --and pan like '506126%0011' group by pan,terminal_id

The schema for the data I am querying is as follows

 

post_tran_id LONG 2  post_tran_cust_id :LONG  settle_entity_id :INTEGER  batch_nr : INTEGER    prev_post_tran_id : LONG  next_post_tran_id : LONG   sink_node_name : STRING   tran_postilion_originated : DECIMAL   tran_completed : DECIMAL  tran_amount_req : DECIMAL   tran_amount_rsp : DECIMAL   settle_amount_impact : DECIMAL   tran_cash_req : DECIMAL   tran_cash_rsp : DECIMAL  tran_currency_code : STRING  tran_tran_fee_req : DECIMAL  tran_tran_fee_rsp : DECIMAL  tran_tran_fee_currency_code : STRING  tran_proc_fee_req : DECIMAL  tran_proc_fee_rsp : DECIMAL  tran_proc_fee_currency_code : STRING  settle_amount_req : DECIMAL  settle_amount_rsp : DECIMAL  settle_cash_req : DECIMAL  settle_cash_rsp : DECIMAL  settle_tran_fee_req : DECIMAL  settle_tran_fee_rsp : DECIMAL  settle_proc_fee_req : DECIMAL  settle_proc_fee_rsp : DECIMAL  settle_currency_code : STRING/

However When I run the query against the dataset, I get the following exception

SYSTEM ERROR: ClassCastException: org.apache.drill.exec.vector.NullableDecimal28SparseVector cannot be cast to org.apache.drill.exec.vector.VariableWidthVector

With the following stacktrace

 

java.sql.SQLException: [MapR][DrillJDBCDriver](500165) Query execution error. Details: SYSTEM ERROR: ClassCastException: org.apache.drill.exec.vector.NullableDecimal28SparseVector cannot be cast to org.apache.drill.exec.vector.VariableWidthVector
Fragment 2:18
[Error Id: dba9df08-fb1d-4bd2-93e6-d08fb6f79ff1 on BGDTEST3.INTERSWITCH.COM:31010].
at com.mapr.drill.drill.dataengine.DRQryResultListener.checkAndThrowException(Unknown Source)
at com.mapr.drill.drill.dataengine.DRQryResultListener.getNextBatch(Unknown Source)
at com.mapr.drill.drill.dataengine.DRJDBCResultSet.doLoadRecordBatchData(Unknown Source)
at com.mapr.drill.drill.dataengine.DRJDBCResultSet.doMoveToNextRow(Unknown Source)
at com.mapr.drill.drill.dataengine.DRJDBCQueryExecutor.execute(Unknown Source)
at com.mapr.drill.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.mapr.drill.jdbc.common.SStatement.execute(Unknown Source)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:581)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:692)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:498)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Caused by: com.mapr.drill.support.exceptions.GeneralException: [MapR][DrillJDBCDriver](500165) Query execution error. Details: SYSTEM ERROR: ClassCastException: org.apache.drill.exec.vector.NullableDecimal28SparseVector cannot be cast to org.apache.drill.exec.vector.VariableWidthVector
ERROR   
Took 1 sec. Last updated by mapr at April 30 2018, 8:25:53 AM.
%spark
import spark.implicits._
val pathJson = "maprfs:///iswdata/storage/products/superswitch/parquet/transactions/"
val df = spark.read.parquet(pathJson)
df.printSchema()
val cols= Seq[String]("pan", "settlement")
df.groupBy("pan").agg(sum("settle_amount_impact").alias("settlement"), count("*").alias("number_of_transactions")).select(col("pan"), col("number_of_transactions")).show()

 

More so, the same error occurs when I include a decimal field in the select clause. Please, is there something I am missing or doing wrong, Any pointer will be deeply appreciated

Kind Regards

Outcomes