I've a 5 node MapR cluster running on MapR Version 5.2. Each server has 250GB Memory and 15TB disk space. I noticed that there's a considerable amount of /tmp space used and it's growing as more users are trying to run their applications/jobs.
In /tmp there're user caches that got created with each user(user id) that are trying to run jobs. Under the user id directory there's file cache. This file cache contains jar files, .py files, configuration files for the pyspark jobs / H2o or any other yarn related jobs that has been run by the user. For each job that has been run by the user, there's a separate file cache.
Here's a screen shot:
My question is:
1. Is this happening due to default settings? Can these file caches be directed to a space under HDFS? Say for example, is there any config parameter in Spark that will address the Spark job related caching?
2. Can I remove these file cache periodically?
3. If the caches are removed will it have an effect on the performance? (Most probably no, because these are job specific)