Huge Pig job causes local /tmp directory runs out of disk space.

Document created by Hao Zhu Employee on Feb 18, 2016Last modified by Hao Zhu Employee on Feb 18, 2016
Version 2Show Document
  • View in full screen mode

Author: Hao Zhu

Original Publication Date: April 22, 2015


Huge Pig job causes local /tmp directory runs out of disk space.


Pig 0.13

Root cause:

Per PIG-1838, pig keeps the jar files for each job until the pig script finishes.It means if a single pig script contains lots of MapReduce jobs, pig will create many jar files in /tmp directory on the node where the pig job is submitted. Until the whole pig script finishes, pig will then clean the temp jars.Tests:For example, below pig job will keep 2 jars in /tmp directory until the whole pig job finishes, because it contains 2 MapReduce jobs.

a = load '/dir' using ParquetLoader(); 
b = order a by price ;
STORE b INTO '/output' USING parquet.pig.ParquetStorer;

The temp jars in /tmp during execution:


If we put 2 of above pig jobs into one pig script, pig will keep 4 temp jars in /tmp:

Job4615931692370853067.jar Job182685348991417556.jar

Source Code analysis:The logic is in pig source code --, which calls createTempFile() function in

File submitJarFile = File.createTempFile("Job", ".jar");"creating jar file "+submitJarFile.getName());

Per java source doe --, the directory location is controlled by


File tmpdir = (directory != null) ? directory : TempDirectory.location();


  private TempDirectory() { }


  // temporary directory location

  private static final File tmpdir = new File(fs.normalize(AccessController

  .doPrivileged(new GetPropertyAction(""))));

  static File location() {

  return tmpdir;



To avoid /tmp directory running of disk space, available solutions are:1. Split a huge pig script into small pieces and run each piece separately.Or2. Set to a directory with enough disk space in HADOOP_OPTS or PIG_OPTS before submitting the pig job.For example:

export PIG_OPTS="" 
pig test.pig