AnsweredAssumed Answered

How to access Data via the MapR-httpfs server from Spark?

Question asked by martinhartig on Jan 15, 2018
Latest reply on Feb 6, 2018 by maprcommunity

How to access Data via the MapR-httpfs server from Spark? 

It seems that the httpfs Server in the MapR distro has limitations, especially when connecting a Spark client to the MapR-httpfs server. 

 

Example: Using the MapR sandbox (5.2.1), the following leads to a problem for me: 

me@host:~$ ssh mapr@localhost -p 2222
Password:
Welcome to your Mapr Demo virtual machine.
[mapr@maprdemo ~]$ hadoop fs -put /opt/mapr/spark/spark-2.1.0/conf/spark-defaults.conf webhdfs://maprdemo:14000/tmp/spark-defaults.conf
[mapr@maprdemo ~]$ hadoop fs -ls webhdfs://maprdemo:14000/tmp/
Found 1 items
-rw-r--r-- 1 mapr mapr 909 2018-01-15 06:00 webhdfs://maprdemo:14000/tmp/spark-defaults.conf
[mapr@maprdemo ~]$ /opt/mapr/spark/spark-2.1.0/bin/spark-shell
Spark context Web UI available at http://10.0.2.15:4040
Spark context available as 'sc' (master = local[*], app id = local-1516024917758).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0-mapr-1703
/_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.textFile("webhdfs://maprdemo:14000/tmp/spark-defaults.conf").collect
java.io.FileNotFoundException: Requested file /tmp/spark-defaults.conf/spark-defaults.conf does not exist.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

 

Very interesting the different paths: 

  • Spark tries to access /tmp/spark-defaults.conf
  • The error is w.r.t. /tmp/spark-defaults.conf/spark-defaults.conf 

 

What I experienced:

  • Using an (external) httpfs server as of Hadoop 2.9.0 does not show up this problem. 
  • Accessing from external Spark to MapR-httpfs shows the same problem. 
  • Problem persists with MapR 6.0.0 and MEP 3.0 

 

Any ideas? 

Outcomes