AnsweredAssumed Answered

Reading maprfs from Spark

Question asked by biksn on Jun 18, 2015
Latest reply on Jun 18, 2015 by biksn
I'm running Apache Spark-1.4.0 on Mesos. I have been trying to read file from Mapr cluster but not have much success with it. I tried 2 versions of Apache Spark (with and without Hadoop).

I can get to the spark-shell in the with-hadoop version, but still can't access maprfs[2]. Without-Hadoop version bails out with org.apache.hadoop.fs.FSDataInputStream ClassNotFoundException[1].

I also tried using hdfs:// but that didn't work either [3]

I must be doing something wrong, can anyone please help me figure out how I can read maprfs from Spark ?

Thank you,
Bikrant

[1]
#Spark without Hadoop:

    ~/spark/spark-1.4.0-bin-without-hadoop $ ./bin/spark-shell --master mesos://mesos-master.local:5050 --driver-library-path=/opt/mapr/lib  --driver-class-path /opt/mapr/hadoop/hadoop-0.20.2/conf:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar:/opt/mapr/lib/maprfs-4.1.0-mapr.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/commons-logging-1.1.3.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/maprfs-4.1.0-mapr.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/zookeeper-3.4.5-mapr-1406.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/guava-13.0.1.jar:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/hdfs/hadoop-hdfs-2.5.1-mapr-1503.jar
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
            at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:111)
            at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:111)
            at scala.Option.getOrElse(Option.scala:120)
            at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:111)
            at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:97)
            at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:106)
            at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
            at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
            at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
            at java.security.AccessController.doPrivileged(Native Method)
            at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
            ... 7 more

[2]
#Spark with Hadoop 2.6

    ~/spark/spark-1.4.0-bin-hadoop2.6$ ./bin/spark-shell --master mesos://mesos-master.local:5050 --driver-library-path=/opt/mapr/lib  --driver-class-path /opt/mapr/hadoop/hadoop-0.20.2/conf:/opt/mapr/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar:/opt/mapr/lib/maprfs-4.1.0-mapr.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/commons-logging-1.1.3.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/maprfs-4.1.0-mapr.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/zookeeper-3.4.5-mapr-1406.jar:/opt/mapr/hadoop/hadoop-0.20.2/lib/guava-13.0.1.jar:/opt/mapr/hadoop/hadoop-2.5.1/share/hadoop/hdfs/hadoop-hdfs-2.5.1-mapr-1503.jar
    WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
    15/06/17 23:14:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicabl
    .
    .
    .
    scala>  val textFile = sc.textFile("maprfs:///user/packages/CHANGES.txt")
    scala> textFile.count()
    java.io.IOException: No FileSystem for scheme: maprfs
            at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
            at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
            at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
            at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
            at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
            at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
            at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)


[3]

    scala>  val textFile = sc.textFile("hdfs:///user/packages/CHANGES.txt")
    scala> textFile.count()
    java.io.IOException: Incomplete HDFS URI, no host: hdfs:/user/packages/CHANGES.txt



Outcomes