AnsweredAssumed Answered

Spark - Save as Parquet - Failed to Delete _temp...

Question asked by john.humphreys on Jul 28, 2017
Latest reply on Jul 31, 2017 by john.humphreys

I frequently see the following error messages while saving parquet files on Spark (2.1.0) / MapR (5.2.0 / MEP 3).


17/07/28 12:57:25 ERROR MapRFileSystem: Failed to delete path maprfs:/work/dev/streaming-test/whole-day/ncram-division/204/13/DIGEST-20170723-112010-7e8dcb71-886a-4abf-b532-3b5a657db9d5.parquet/_temporary-68961222-1c74-4642-b575-25ef1aba53fc, error: No such file or directory (2)


This happens in multiple applications, and I'm pretty sure I saw it in Spark 1.6.1 too when we used to use that.  Here is a sample script that I'm running right now that produces it (very simple) when run in Spark shell.  The same errors are shown after spark submit too.


Is this a known Spark or MapR issue?  I don't see it specifically in any spark JIRAs that I've looked at yet (at least not in the context of saving a basic Parquet file).


import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem


val path = new Path("maprfs:///work/dev/streaming-test/whole-day/parquet/")
val files = FileSystem.get(sc.hadoopConfiguration).listFiles(path, true)

val targetFiles = scala.collection.mutable.HashSet.empty[String]


while (files.hasNext) {
    targetFiles +=


val gmbHosts = Seq(903795,903795,1396771,1201240,1395080,1051024,1397604,903796,903796,1201242,1201241,1395081,1202271,


val totalFiles = targetFiles.size
var currentFile = 0

targetFiles.toList.foreach(file => {
    currentFile += 1
    println(s"On $currentFile / $totalFiles")
    val batch =
    val filtered = batch.where(col("entity").isin(gmbHosts:_*))
    val newFileName = file.replace("/whole-day/parquet/", "/whole-day/ncram-division/")