AnsweredAssumed Answered

HIVE: Insert hangs on FileSinkOperator: Moving tmp dir

Question asked by dimamah on May 21, 2014
We are running a big multitable insert with ~800 inserts like so : 

    from tbl
     insert overwrite table tbl2 partition(x='x1_c') select y,count(*) where t='x1' group by y, 'x1'
     insert overwrite table tbl2 partition(x='x1_s') select y,sum(z) where t='x1'  group by y, 'x1'
    ........
     insert overwrite table tbl2 partition(x='xn_c') select y,count(*) where t='xn' group by y, 'xn'
     insert overwrite table tbl2 partition(x='xn_s') select y,sum(z) where t='xn' group by y, 'xn'

In the hiveserver log the the last output is `FileSinkOperator: Moving tmp dir` :

    14/05/20 12:39:56 [Thread-1832] INFO exec.FileSinkOperator: Moving tmp dir: maprfs:/tmp/hive-hadoop/hive_2014-05-20_12-31-49_348_4182358033124325536/_tmp.-ext-11062 to: maprfs:/tmp/hive-hadoop/hive_2014-05-20_12-31-49_348_4182358033124325536/_tmp.-ext-11062.intermediate
    14/05/20 12:39:56 [Thread-1832] INFO exec.FileSinkOperator: Moving tmp dir: maprfs:/tmp/hive-hadoop/hive_2014-05-20_12-31-49_348_4182358033124325536/_tmp.-ext-11062.intermediate to: maprfs:/tmp/hive-hadoop/hive_2014-05-20_12-31-49_348_4182358033124325536/-ext-11062
    14/05/20 12:39:56 [Thread-1832] INFO exec.FileSinkOperator: Moving tmp dir: maprfs:/tmp/hive-hadoop/hive_2014-05-20_12-31-49_348_4182358033124325536/_tmp.-ext-11064 to: maprfs:/tmp/hive-hadoop/hive_2014-05-20_12-31-49_348_4182358033124325536/_tmp.-ext-11064.intermediate
    14/05/20 12:39:56 [Thread-1832] INFO exec.FileSinkOperator: Moving tmp dir: maprfs:/tmp/hive-hadoop/hive_2014-05-20_12-31-49_348_4182358033124325536/_tmp.-ext-11064.intermediate to: maprfs:/tmp/hive-hadoop/hive_2014-05-20_12-31-49_348_4182358033124325536/-ext-11064

This was just hanging for a full day without returning anything to the JDBC connection that executed it.

Running `strace -f` on the hiveserver's process showed :

    [pid 28063] futex(0x2ae26860d9d0, FUTEX_WAIT, 30632, NULL <unfinished ...>
    [pid 31032] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 260707000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 310875000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 361035000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 411187000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 461337000}, ffffffff <unfinished ...>
    [pid 25786] <... epoll_wait resumed> {}, 64, 237) = 0
    [pid 25786] epoll_wait(151,  <unfinished ...>
    [pid 31032] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 511579000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 561733000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 611901000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 662056000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 712199000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 762349000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 812503000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 862648000}, ffffffff <unfinished ...>
    [pid 30822] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
    [pid 30822] futex(0x2ae26c084428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 30822] futex(0x2ae28c000944, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658153, 842860000}, ffffffff <unfinished ...>
    [pid 31032] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 912787000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658152, 962931000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658153, 13093000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658153, 63246000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658153, 113432000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
    [pid 31032] futex(0x2ae26c0bb428, FUTEX_WAKE_PRIVATE, 1) = 0
    [pid 31032] futex(0x40bf33a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1400658153, 163610000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

Continuing nonstop.

This is Hive 0.10 M3 v3.0.1 

Any clues?

Outcomes