AnsweredAssumed Answered

Spark Scala FileNotFound

Question asked by terryhealy on Jul 28, 2017
Latest reply on Jul 31, 2017 by maprcommunity

I'm running on MapR 5.2.0, YARN 2.7.0, Spark 2.1.0, Scala 2.11.6.

My small Scala log parsing test application runs with master = "local", but I have been unable to run it on the cluster. I have successfully run other Spark jobs under YARN that did not attempt to use the MapR file system. I'm submitting the job with this:

 

spark-submit --class gov.bnl.bro.rita.BroSparkRita \
   --master yarn \
   target/BrosparkRita-0.0.1-SNAPSHOT.jar \
   /user/spot/rita/1m.log \
   /user/spot/rita/out2

 

The app fails to find the input file (first argument above)

 

Exception in thread "main" java.io.FileNotFoundException: /user/spot/rita/1m.log (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at scala.io.Source$.fromFile(Source.scala:91)
at scala.io.Source$.fromFile(Source.scala:76)
at scala.io.Source$.fromFile(Source.scala:54)
at gov.bnl.bro.rita.BroSparkRita$.main(BroSparkRita.scala:47)
at gov.bnl.bro.rita.BroSparkRita.main(BroSparkRita.scala)

 

The same user can see the file using: hadoop fs -ls /user/spot/rita/1m.log:

 

hfs -ls /user/spot/rita
Found 1 items
-rwxr-xr-x 3 spot spot 129753228 2017-07-28 16:13 /user/spot/rita/1m.log


I have tried adding the "maprfs://" prefix (the instructions I have said this was not necessary, but I tried both ways).

I've searched my butt off to no avail.

Any suggestions?

Thanks,

Terry

Outcomes