AnsweredAssumed Answered

Dependency Issues Implementing Spark Structured Streaming on MapR Cluster

Question asked by PETER.EDIKE on Jun 13, 2018
Latest reply on Jun 13, 2018 by vmeghraj

Hello Everyone,

 

I am trying to implement Spark Structured Streaming on MapR Cluster using the following code 

 

 

import org.apache.spark.sql.SQLContext
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import spark.implicits._
import org.apache.spark.sql.kafka010.KafkaSourceProvider

 

 

val topic1 = "/iswdata/streams/quickteller:fundstransfer"
val topic2 = "/iswdata/streams/quickteller:billpayment"
val topic3 = "/iswdata/streams/quickteller:transactions"
val topics = topic1 + "," + topic2 + "," + topic3

//We Build the Spark Session That We Will Use For Listening To These Topics
val spark = SparkSession.builder.appName("QuicktellerTransactionTypeAggregator").getOrCreate()

val lines = spark.readStream.format("kafka").option("subscribe", topics).load().selectExpr("CAST(value AS STRING)").as[String]

 

However, On Running the Code, I get the Exception 

java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.kafka010.KafkaSourceProvider$
at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:323)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60)
at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:198)
at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:88)
at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:88)
at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
... 68 elided

 

I checked the spark/spark-2.2.1 folder and I can the following in the jars folder.

 

spark-sql-kafka-0-10_2.11-2.2.1-mapr-1803.jar
spark-sql_2.11-2.2.1-mapr-1803.jar
spark-streaming-kafka-0-10-assembly_2.11-2.2.1-mapr-1803.jar
spark-core_2.11-2.2.1-mapr-1803.jar

 

Please I did like to know If I am missing Something and what I can do to get past this point

Outcomes