I'm running on MapR 5.2.0, YARN 2.7.0, Spark 2.1.0, Scala 2.11.6.
My small Scala log parsing test application runs with master = "local", but I have been unable to run it on the cluster. I have successfully run other Spark jobs under YARN that did not attempt to use the MapR file system. I'm submitting the job with this:
spark-submit --class gov.bnl.bro.rita.BroSparkRita \
--master yarn \
The app fails to find the input file (first argument above)
Exception in thread "main" java.io.FileNotFoundException: /user/spot/rita/1m.log (No such file or directory)
at java.io.FileInputStream.open(Native Method)
The same user can see the file using: hadoop fs -ls /user/spot/rita/1m.log:
hfs -ls /user/spot/rita
Found 1 items
-rwxr-xr-x 3 spot spot 129753228 2017-07-28 16:13 /user/spot/rita/1m.log
I have tried adding the "maprfs://" prefix (the instructions I have said this was not necessary, but I tried both ways).
I've searched my butt off to no avail.