AnsweredAssumed Answered

java.lang.IllegalStateException: unread block data. when running spark with YARN

Question asked by Velumani on May 28, 2016
Latest reply on May 30, 2016 by Velumani

Hi,

    I am getting java.lang.IllegalStateException: unread block data exception when  I  run a spark job with YARN Client,  The same job is successful if I run it as local.

The code actually trying to store data into HBase table.

 

Below is the Exception trace

 

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 5, AWS-DEV-MAPR-07): java.lang.IllegalStateException: unread block data

        at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2449)

        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1385)

        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)

        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)

        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)

        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)

        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)

        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)

        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)

        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

 

Driver stacktrace:

        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)

        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)

        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)

        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)

        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)

        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)

        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)

        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)

        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1922)

        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1213)

        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1156)

        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1156)

        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)

        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)

        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)

        at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1156)

        at org.oneplatform.recon.utils.HBaseDBUtils.saveToHbase(HBaseDBUtils.scala:11)

        at org.oneplatform.recon.HBaseSummarizer$$anonfun$runSummary$1.apply$mcVI$sp(HBaseSummarizer.scala:113)

        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)

        at org.oneplatform.recon.HBaseSummarizer.runSummary(HBaseSummarizer.scala:109)

        at org.oneplatform.recon.HBaseSummarizer.execute(HBaseSummarizer.scala:68)

        at org.oneplatform.recon.SummarizeApp$.main(SummarizeApp.scala:9)

        at org.oneplatform.recon.SummarizeApp.main(SummarizeApp.scala)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:742)

        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.lang.IllegalStateException: unread block data

        at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2449)

        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1385)

        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)

        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)

        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)

        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)

        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)

        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)

        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)

        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

Outcomes