AnsweredAssumed Answered

How to read multiple text files from mfs symlink

Question asked by nirav on Feb 26, 2015
Latest reply on Mar 20, 2015 by jerdavis2
I am trying to read multiple parts files of hdfs symlink  from spark. I am able to use wildcard(*) to read multiple files from path if path is physical
e.g.
`sparkContext.textFile(/some/path/file_123321_00/part-r-000*)`

But I have created symlink to this folder on hdfs called 'symlink'. and when I use 
`sparkContext.textFile(/some/path/symlink/part-r-000*)`  it fails to detect any paths.
I tried `hadoop fs -ls` on both path. First one works but one one with symlink doesn't work as expected.  

 We are using MapR-FS which allows us to create such symlinks but I am not sure what's the best way to read from it from Spark. Reason we create symlinks is to point it to latest data set. I see there's `hadoop mfs Lsr` which can navigate to symlinks but how I can materialize similar approach from spark.

Outcomes