How to Read Text Files from MFS Symlink in spark

Document created by mufeed Employee on Feb 7, 2016
Version 1Show Document
  • View in full screen mode

Author: Mufeed Usman

 

Original Publication Date: April 23, 2015

 

Scenario:

Multiple part files reside on an HDFS (MFS) symlink location. Wildcard(*) is able to read multiple files from the path if the path is physical. Example,

sparkContext.textFile(/some/path/file_123321_00/part-r-000*

But when a symlink to this folder is used it fails to detect any paths.

sparkContext.textFile(/some/path/symlink/part-r-000*

Goal:

How to read these files on MapR-FS with symlinks from Spark.

Solution:

The following helps to obtain the target directory path. Using MapRFileStatus instead of FileStatus, an API getSymlink() becomes available for use as shown below.

FileSystem fs =  FileSystem.get(conf); 
MapRFileStatus fst = (MapRFileStatus)fs.getFileStatus(path);
Path target = fst.getSymlink();

Attachments

    Outcomes