AnsweredAssumed Answered

Cleaning up job and log files?

Question asked by chriscurtin on Aug 13, 2012
Latest reply on Sep 11, 2012 by chriscurtin
We spent several hours last night cleaning up the cluster and restarting everything after filling up the disk where mapred.local.dir and $HADOOP_LOG_DIR reside.
Looking through the files that were there it doesn't appear the cluster is cleaning up after itself very well. We looked at configuration files and did some online searches, but we can't figure out where the settings are to clean these directories. < mapred.local.dir >/taskTracker/hadoop/jobcache had over 100,000 directories in it when we ran out of space.
Does the cluster cleanup after itself, or do we need cron jobs to clean up? If we have to do it, any pointers to scripts to use or specific directories to clean up and how often?
What we found:
- running jobs, files are removed (moved?) when completed
- current tasks
- but old files for tasks long completed are still here
- > 1000 directories which is the # of jobs set in mapred.jobtracker.retiredjobs.cache.size
<mapred.local.dir>/toBeDeleted/<date string>/hadoop/jobcache
- old jobs, moved here when the node was restarted?
- when are they deleted?
- currently running jobs?
- job_*.xml are currently running jobs
- what are these? Look like job files?
- when are they removed?
- directory per job, child directories per attempt
- never cleaned up? > 9000 job directories after being purged less than 24 hours ago
mapred-site.xml settings:
Relevant log file settings: HADOOP_LOG_DIR="/nfs/mapr/hadoop/logs"