AnsweredAssumed Answered

Spark in Docker on MapR fsclient Issues

Question asked by mandoskippy on Apr 6, 2016
Latest reply on Jul 25, 2016 by mufeed


I understand my testing isn't a "normal" course of action for spark/mapr, but the errors I am getting seem to be pointed at the FS itself which is baffling me.  From inside of a docker container, I can run hadoop fs ls / and see my filesystem. To cheat, in this yarn example, I am running the container with host networking, and I've mounted RO the /opt/mapr volume from the house.

 

However, when I try to copy a file (or when spark tries to copy a file) to mapr fs, I get a very odd error (see below) I am not sure what to make of this. I get this error even if I try to hadoop fs -copyFromLocal to the same files. I am baffled at what this could be... perhaps an issue with the hadoop client using hostnetworking?

 

 

 

 

6/04/06 18:19:28 INFO yarn.Client: Uploading resource file:/mapr/brewpot/mesos/prod/spark/spark-1.6.1-bin-without-hadoop/lib/spark-assembly-1.6.1-hadoop2.2.0.jar -> maprfs:/user/root/.sparkStaging/application_1459959143507_0014/spark-assembly-1.6.1-hadoop2.2.0.jar
2016-04-06 18:19:28,6330 ERROR Client fs/client/fileclient/cc/writebuf.cc:353 Thread: 1471 FlushWrite failed: File spark-assembly-1.6.1-hadoop2.2.0.jar, error: Invalid argument(22), pfid 2055.174740.2851966, off 65536, fid 7372.32.1181258
2016-04-06 18:19:28,6331 ERROR Client fs/client/fileclient/cc/writequeue.cc :165 Thread: 1471 WriteBuf null/err. 22
2016-04-06 18:19:28,6332 ERROR Client fs/client/fileclient/cc/writebuf.cc:353 Thread: 1472 FlushWrite failed: File spark-assembly-1.6.1-hadoop2.2.0.jar, error: Invalid argument(22), pfid 2055.174740.2851966, off 196608, fid 7372.32.1181258
2016-04-06 18:19:28,6333 ERROR Client fs/client/fileclient/cc/writequeue.cc :165 Thread: 1472 WriteBuf null/err. 22
16/04/06 18:19:28 ERROR fs.Inode: Write failed for file: /user/root/.sparkStaging/application_1459959143507_0014/spark-assembly-1.6.1-hadoop2.2.0.jar, error: Invalid argument
16/04/06 18:19:28 ERROR fs.Inode: Marking failure for: /user/root/.sparkStaging/application_1459959143507_0014/spark-assembly-1.6.1-hadoop2.2.0.jar, error: Invalid argument
16/04/06 18:19:28 ERROR fs.Inode: Throwing exception for: /user/root/.sparkStaging/application_1459959143507_0014/spark-assembly-1.6.1-hadoop2.2.0.jar, error: Invalid argument
16/04/06 18:19:28 ERROR fs.Inode: Throwing exception for: /user/root/.sparkStaging/application_1459959143507_0014/spark-assembly-1.6.1-hadoop2.2.0.jar, error: Invalid argument
16/04/06 18:19:28 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1459959143507_0014
16/04/06 18:19:28 ERROR spark.SparkContext: Error initializing SparkContext.
java.io.IOException: 2055.174740.2851966 /user/root/.sparkStaging/application_1459959143507_0014/spark-assembly-1.6.1-hadoop2.2.0.jar (Invalid argument)
  at com.mapr.fs.Inode.throwIfFailed(Inode.java:387)
  at com.mapr.fs.Inode.flushPages(Inode.java:503)
  at com.mapr.fs.Inode.releaseDirty(Inode.java:581)               
  at com.mapr.fs.MapRFsOutStream.dropCurrentPage(MapRFsOutStream.java:73)
  at com.mapr.fs.MapRFsOutStream.write(MapRFsOutStream.java:85)
  at com.mapr.fs.MapRFsDataOutputStream.write(MapRFsDataOutputStream.java:39)
  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
  at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:376)
  at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:346)
  at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:297)
  at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:317)
  at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:407)
  at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:446)
  at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:444)
  at scala.collection.immutable.List.foreach(List.scala:318)
  at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:444)
  at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:727)
  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142)
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
  at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
  at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
  at py4j.Gateway.invoke(Gateway.java:214)
  at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
  at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
  at py4j.GatewayConnection.run(GatewayConnection.java:209)
  at java.lang.Thread.run(Thread.java:745)

Outcomes