How to format date or timestamp types using to_date or to_timestamp function in Drill

Document created by Hao Zhu Employee on Feb 18, 2016
Version 1Show Document
  • View in full screen mode

Author: Hao Zhu

Original Publication Date: March 25, 2015

 

Env:

Drill

Goal:

This article introduces how to use to_date or to_timestamp function in Drill to format date or timestamp data type.

Solution:

Drill's to_date or to_timestamp function should match JODA documentation regarding date or timestamp data format.

 

Symbol Meaning Presentation Examples

------ ------- ------------ -------

G era text AD

C century of era (>=0) number 20

Y year of era (>=0) year 1996

 

x weekyear year 1996

w week of weekyear number 27

e day of week number 2

E day of week text Tuesday; Tue

 

y year year 1996

D day of year number 189

M month of year month July; Jul; 07

d day of month number 10

 

a halfday of day text PM

K hour of halfday (0~11) number 0

h clockhour of halfday (1~12) number 12

 

H hour of day (0~23) number 0

k clockhour of day (1~24) number 24

m minute of hour number 30

s second of minute number 55

S fraction of second number 978

 

z time zone text Pacific Standard Time; PST

Z time zone offset/id zone -0800; -08:00; America/Los_Angeles

 

' escape for text delimiter

'' single quote literal

Note that, Drill is using "dd" instead of "DD" for "day of month", which is not the same as some other RDBMS."DD" stands for "day of year".Assume the goal is to convert the data in the csv file from string format to timestamp, below are the examples on when to use "DD" and "dd" format.

 

1. When to use "dd".

For example, this is original format to use "dd":

 

select columns[0] from dfs.tmp.`drilltest/a.csv`;

+------------+

| EXPR$0 |

+------------+

| 2014-11-24 18:08:08 |

+------------+

The correct format in to_timestamp should be:

select to_timestamp(columns[0], 'YYYY-MM-dd HH:mm:ss') from dfs.tmp.`drilltest/a.csv`;

+------------+

| EXPR$0 |

+------------+

| 2014-11-24 18:08:08.0 |

+------------+

If using "DD", it will produce unexpected outputs.

select to_timestamp(columns[0], 'YYYY-MM-DD HH:mm:ss') from dfs.tmp.`drilltest/a.csv`;

+------------+

| EXPR$0 |

+------------+

| 2014-01-24 18:08:08.0 |

+------------+

2. When to use "DD".

For example, this is original format to use "DD":

select columns[0] from dfs.tmp.`b.csv`;

+------------+

| EXPR$0 |

+------------+

| 2014 328 18:08:08 |

+------------+

The correct format in to_timestamp should be:

select to_timestamp(columns[0],'YYYY DD HH:mm:ss') from dfs.tmp.`b.csv`;

+------------+

| EXPR$0 |

+------------+

| 2014-11-24 18:08:08.0 |

+------------+

Attachments

    Outcomes