AnsweredAssumed Answered

Data too long for column INPUT_SPLIT_LOCATIONS at row 1

Question asked by jesco39 on Dec 22, 2015
Receiving the following alarm for the jobtracker:

Alarm raised: NODE_ALARM_METRICS_WRITE_PROBLEM
Message: Data too long for column 'INPUT_SPLIT_LOCATIONS' at row 1

Looking at the mysql table, it seems pretty big already varchar(8192), and it seems bigger than the default of varchar(4096).

mysql> describe TASK;
+-----------------------+---------------+------+-----+-------------------+-------+
| Field                 | Type          | Null | Key | Default           | Extra |
+-----------------------+---------------+------+-----+-------------------+-------+
| TASK_ID               | varchar(64)   | NO   | PRI | NULL              |       |
| JOB_ID                | varchar(64)   | NO   | MUL | NULL              |       |
| TYPE                  | varchar(32)   | NO   |     | NULL              |       |
| SUCCESS_ATTEMPT_ID    | varchar(64)   | YES  |     | NULL              |       |
| INPUT_SPLIT_LOCATIONS | varchar(8192) | YES  |     | NULL              |       |
| INPUT_SPLIT_INFO      | varchar(128)  | YES  |     | NULL              |       |
| STATUS                | varchar(32)   | YES  |     | NULL              |       |
| TIME_STARTED          | bigint(20)    | YES  |     | NULL              |       |
| TIME_FINISHED         | bigint(20)    | YES  |     | NULL              |       |
| PARTITION_ID          | bigint(20)    | NO   | PRI | 0                 |       |
| CREATED               | timestamp     | NO   |     | CURRENT_TIMESTAMP |       |
+-----------------------+---------------+------+-----+-------------------+-------+
11 rows in set (0.00 sec)

Not really sure of a way to debug this as it looks like most rows for INPUT_SPLIT_LOCATIONS has only 3 tasktracker hostnames and the topology it is in. Which for most cases is only 129 characters. Is there a way to see what the values are that are failing for these writes? Or is this a case of simply just having to increase the varchar() but I would like to understand if that is desired for best practice first before i just go blindly updating the value.

Outcomes