Understanding Drill's timestamp and timezone

Document created by Hao Zhu Employee on Feb 18, 2016
Version 1Show Document
  • View in full screen mode

Author: Hao Zhu
Original Publication Date: May 8, 2015

 

Goal:

This article explains the behavior of Apache Drill's timestamp and timezone.

Env:

Drill all versions

Solution:

1. Drill uses OS's timezone by default.

a. If the OS's timezone is "America/New_York":

# cat /etc/sysconfig/clock 
ZONE="America/New_York"

 

After drillbits start, the BOOT configuration "user.timezone" is set to "America/New_York".

> select * from sys.options where name like '%timezone%';

+------------+------------+------------+------------+------------+------------+------------+

| name | kind | type | num_val | string_val | bool_val | float_val |

+------------+------------+------------+------------+------------+------------+------------+

| user.timezone | STRING | BOOT | null | "America/New_York" | null | null |

+------------+------------+------------+------------+------------+------------+------------+

b. If the OS's timezone is "UTC":

# cat /etc/sysconfig/clock 
ZONE="UTC"

After drillbits start, the BOOT configuration "user.timezone" is set to "UTC".

> select * from sys.options where name like '%timezone%';

+------------+------------+------------+------------+------------+------------+------------+

| name | kind | type | num_val | string_val | bool_val | float_val |

+------------+------------+------------+------------+------------+------------+------------+

| user.timezone | STRING | BOOT | null | "UTC" | null | null |

+------------+------------+------------+------------+------------+------------+------------+

c. If the OS's timezone is "America/New_York", we can set "user.timezone" to UTC to override it.

Put "-Duser.timezone=UTC" in DRILL_JAVA_OPTS in drill-env.sh under /opt/mapr/drill/drill-<version>/conf/ on all nodes, and then restart drillbits using command:

maprcli node services -name drill-bits -action restart -filter csvc=="drill-bits"

Then:

> select * from sys.options where name like '%timezone%';

+------------+------------+------------+------------+------------+------------+------------+

| name | kind | type | num_val | string_val | bool_val | float_val |

+------------+------------+------------+------------+------------+------------+------------+

| user.timezone | STRING | BOOT | null | "UTC" | null | null |

+------------+------------+------------+------------+------------+------------+------------+

2. Invalid timestamp for daylight saving.

a. If Drill's timezone is "America/New_York", below timestamp is invalid due to daylight saving.

> select to_timestamp('2015-03-08 02:58:51','YYYY-MM-dd HH:mm:ss') from sys.version;

Error: exception while executing query: Failure while executing query. (state=,code=0)

Query failed: SYSTEM ERROR: Unexpected exception during fragment initialization: Internal error: Error while applying rule

ReduceExpressionsRule_Project, args [rel#2186:LogicalProject.NONE.ANY([]).[](input=rel#2185:Subset#0.ENUMERABLE.ANY([]).[],EXPR$0=TO_TIMESTAMP('2015-03-08 02:58:51', 'YYYY-MM-dd HH:mm:ss'))]

From drillbit.log under directory /opt/mapr/drill/drill-<version>/logs, below error shows up:

Caused by: org.joda.time.IllegalInstantException: 
Cannot parse "2015-03-08 02:58:51":
Illegal instant due to time zone offset transition (America/New_York)

b. There is no such issue if Drill's timezone is set to UTC.

 

> select to_timestamp('2015-03-08 02:58:51','YYYY-MM-dd HH:mm:ss') from sys.version;

+------------+

| EXPR$0 |

+------------+

| 2015-03-08 02:58:51.0 |

+------------+

1 row selected (0.109 seconds)

 

> select * from sys.options where name like '%timezone%';

+------------+------------+------------+------------+------------+------------+------------+

| name | kind | type | num_val | string_val | bool_val | float_val |

+------------+------------+------------+------------+------------+------------+------------+

| user.timezone | STRING | BOOT | null | "UTC" | null | null |

+------------+------------+------------+------------+------------+------------+-

3. Function to_timestamp() expresses the data in Drill's timezone.

a. If the source data has no timezone, function to_timestamp() treats it in UTC.

For example, if Drill's timezone is "America/New_York", the timestamp data "2012-02-04 04:59:59" will be treated as "2012-02-04 04:59:59 UTC";Then Drill will express it to its timezone then it will become "2012-02-03 23:59:59" America/New_York.

That is the reason for below behavior:

 

> select * from sys.options where name like '%timezone%';

+------------+------------+------------+------------+------------+------------+------------+

| name | kind | type | num_val | string_val | bool_val | float_val |

+------------+------------+------------+------------+------------+------------+------------+

| user.timezone | STRING | BOOT | null | "America/New_York" | null | null |

+------------+------------+------------+------------+------------+------------+------------+

 

> select to_date(to_timestamp('2012-02-04 05:00:00','YYYY-MM-dd HH:mm:ss' )) from sys.version;

+------------+

| EXPR$0 |

+------------+

| 2012-02-04 |

+------------+

 

> select to_date(to_timestamp('2012-02-04 04:59:59','YYYY-MM-dd HH:mm:ss' )) from sys.version;

+------------+

| EXPR$0 |

+------------+

| 2012-02-03 |

+------------+

If Drill's timezone is "UTC", then there is no such behavior because Drill treats the timestamp data as UTC, and will also express it as UTC.

Things are consistent.

2 people found this helpful

Attachments

    Outcomes