AnsweredAssumed Answered

Spark - Save as Parquet - Failed to Delete _temp...

Question asked by john.humphreys on Jul 28, 2017
Latest reply on Jul 31, 2017 by john.humphreys

I frequently see the following error messages while saving parquet files on Spark (2.1.0) / MapR (5.2.0 / MEP 3).

 

17/07/28 12:57:25 ERROR MapRFileSystem: Failed to delete path maprfs:/work/dev/streaming-test/whole-day/ncram-division/204/13/DIGEST-20170723-112010-7e8dcb71-886a-4abf-b532-3b5a657db9d5.parquet/_temporary-68961222-1c74-4642-b575-25ef1aba53fc, error: No such file or directory (2)

 

This happens in multiple applications, and I'm pretty sure I saw it in Spark 1.6.1 too when we used to use that.  Here is a sample script that I'm running right now that produces it (very simple) when run in Spark shell.  The same errors are shown after spark submit too.

 

Is this a known Spark or MapR issue?  I don't see it specifically in any spark JIRAs that I've looked at yet (at least not in the context of saving a basic Parquet file).

 

import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem

 

val path = new Path("maprfs:///work/dev/streaming-test/whole-day/parquet/")
val files = FileSystem.get(sc.hadoopConfiguration).listFiles(path, true)

val targetFiles = scala.collection.mutable.HashSet.empty[String]

 

while (files.hasNext) {
    targetFiles += files.next.getPath.getParent.toString
}

 

val gmbHosts = Seq(903795,903795,1396771,1201240,1395080,1051024,1397604,903796,903796,1201242,1201241,1395081,1202271,
903823,903823,1051025,1201238,1201239,904577,904577,1189313,1396769,1189314,1396770,904558,904558,903794,903794)

 

val totalFiles = targetFiles.size
var currentFile = 0


targetFiles.toList.foreach(file => {
    currentFile += 1
    println(s"On $currentFile / $totalFiles")
    val batch = spark.read.parquet(file)
    val filtered = batch.where(col("entity").isin(gmbHosts:_*))
    val newFileName = file.replace("/whole-day/parquet/", "/whole-day/ncram-division/")
    filtered.write.parquet(newFileName)
    println(newFileName)
})

Outcomes