prakhar

Hive dynamic partition on timestamp Column issue

Blog Post created by prakhar Employee on Dec 6, 2016

when we try to create hive external table partitioned on timestamp data type. Since timestamp value contains a space along with colons, while we insert data into the partitioned table via dynamic partition process, the colons are replaced by '%3A' character.

 

Below source table created with one columns as timestamp datatype

 

CREATE EXTERNAL TABLE testtbl1 ( insup_flag string, cust_id string,ts timestamp) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/tmp/optum1';

 


Destination Table :

 

CREATE EXTERNAL TABLE `testtbl`(
  `insup_flag` string,
  `cust_id` string)
PARTITIONED BY (
  `ts` timestamp)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'maprfs:/tmp/optum'
TBLPROPERTIES (
  'transient_lastDdlTime'='1470122169')
Time taken: 0.286 seconds, Fetched: 15 row(s)

 

 

 

 

 

hive> select * from testtbl1;
OK
I       OPTUM   2016-07-23 23:02:06
I       OPTUM   2016-07-24 23:02:06
I       OPTUM   2016-07-25 23:02:06
I       OPTUM   2016-07-26 23:02:06
I       OPTUM   2016-07-27 23:02:06
I       OPTUM   2016-07-28 23:02:06
Time taken: 0.124 seconds, Fetched: 6 row(s)

 

 

 

 

 


INSERTING DATA FROM testtbl1 to testtbl ,this will create 6 partitions dynamically

 

hive> set hive.exec.dynamic.partition.mode=nonstrict;


hive> INSERT INTO testtbl partition (ts) select insup_flag,cust_id,ts from testtbl1;
Query ID = mapr_20160802132835_77027c75-5d71-4d02-b0d0-997b692f5b52
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1469804331893_0001, Tracking URL = https://srini-node3:8090/proxy/application_1469804331893_0001/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job  -kill job_1469804331893_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-08-02 13:28:45,038 Stage-1 map = 0%,  reduce = 0%
2016-08-02 13:28:52,439 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.63 sec
MapReduce Total cumulative CPU time: 1 seconds 630 msec
Ended Job = job_1469804331893_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: maprfs:/tmp/optum/.hive-staging_hive_2016-08-02_13-28-35_359_3007545903952650758-1/-ext-10000
Loading data to table default.testtbl partition (ts=null)
         Time taken for load dynamic partitions : 845
        Loading partition {ts=2016-07-23 23:02:06}
        Loading partition {ts=2016-07-28 23:02:06}
        Loading partition {ts=2016-07-27 23:02:06}
        Loading partition {ts=2016-07-25 23:02:06}
        Loading partition {ts=2016-07-24 23:02:06}
        Loading partition {ts=2016-07-26 23:02:06}
         Time taken for adding to write entity : 5
Partition default.testtbl{ts=2016-07-23 23:02:06} stats: [numFiles=1, numRows=1, totalSize=8, rawDataSize=7]
Partition default.testtbl{ts=2016-07-25 23:02:06} stats: [numFiles=1, numRows=1, totalSize=8, rawDataSize=7]
Partition default.testtbl{ts=2016-07-27 23:02:06} stats: [numFiles=1, numRows=1, totalSize=8, rawDataSize=7]
Partition default.testtbl{ts=2016-07-26 23:02:06} stats: [numFiles=1, numRows=1, totalSize=8, rawDataSize=7]
Partition default.testtbl{ts=2016-07-28 23:02:06} stats: [numFiles=1, numRows=1, totalSize=8, rawDataSize=7]
Partition default.testtbl{ts=2016-07-24 23:02:06} stats: [numFiles=1, numRows=1, totalSize=8, rawDataSize=7]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 1.63 sec   MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 630 msec
OK
Time taken: 20.776 seconds

 

 

 

hive> show partitions testtbl;
OK
ts=2016-07-23 23%3A02%3A06
ts=2016-07-24 23%3A02%3A06
ts=2016-07-25 23%3A02%3A06
ts=2016-07-26 23%3A02%3A06
ts=2016-07-27 23%3A02%3A06
ts=2016-07-28 23%3A02%3A06
Time taken: 0.086 seconds, Fetched: 6 row(s)

 

 

 

[mapr@srini-node2 ~]$ hadoop fs -ls /tmp/optum
Found 6 items
drwxrwxrwx   - mapr root          1 2016-08-02 13:28 /tmp/optum/ts=2016-07-23 23%3A02%3A06
drwxrwxrwx   - mapr root          1 2016-08-02 13:28 /tmp/optum/ts=2016-07-24 23%3A02%3A06
drwxrwxrwx   - mapr root          1 2016-08-02 13:28 /tmp/optum/ts=2016-07-25 23%3A02%3A06
drwxrwxrwx   - mapr root          1 2016-08-02 13:28 /tmp/optum/ts=2016-07-26 23%3A02%3A06
drwxrwxrwx   - mapr root          1 2016-08-02 13:28 /tmp/optum/ts=2016-07-27 23%3A02%3A06
drwxrwxrwx   - mapr root          1 2016-08-02 13:28 /tmp/optum/ts=2016-07-28 23%3A02%3A06

 

 

This is occurring because there is limitation of HDFS/MapRFS that you can not create 
directory names with colon sings due to this hive internally encode this : into
"%3A".
https://issues.apache.org/jira/browse/HDFS-13https://issues.apache.org/jira/browse/HADOOP-3257

To avoid this issue atleast from Hive Shell end you can set below property
set hive.decode.partition.name=TRUE;
It replaces '%3A' by ':' in hive shell.
But in MapRFS nothing changes, and hive uses '%3A' instead ':'

Tried the same in Cloudera

[cloudera@quickstart ~]$ hadoop fs -put t\:t / -put: Can not create a Path from a null string Usage: hadoop fs [generic options] -put [-f] [-p] [-l] <localsrc> ... <dst> [cloudera@quickstart ~]$ hadoop fs -mkdir /t\:test -mkdir: java.net.URISyntaxException: Relative path in absolute URI: t:test Usage: hadoop fs [generic options] -mkdir [-p] <path> ... [cloudera@quickstart ~]$  In MapRFS the same behavior: [mapr@node5 ~]$ hadoop fs -put t\:t / -put: Can not create a Path from a null string Usage: hadoop fs [generic options] -put [-f] [-p] [-l] <localsrc> ... <dst> [mapr@node5 ~]$ hadoop fs -mkdir /t:test -mkdir: java.net.URISyntaxException: Relative path in absolute URI: t:test Usage: hadoop fs [generic options] -mkdir [-p] <path> ... [mapr@node5 ~]$

Outcomes