AnsweredAssumed Answered

Double byte characters can not be used as a search condition with drill

Question asked by hirohata on Jun 27, 2018
Latest reply on Jun 29, 2018 by hirohata

An error occured when dealing with double-byte (Japanese) characters.
When we add '_UTF16' to the where clause, it will works good.
But it is invalid result we set DRILL_SHELL_JAVA_OPTS = "- Dsaffron.default.charset = UTF - 16LE" in drill - env.sh.

Please let me know if there are failures.

 


+ set all drill-bit nodes and restarted drill-bit
------------------------------------------------------
/opt/mapr/drill/drill-1.10.0/conf/drill-env.sh
export DRILL_SHELL_JAVA_OPTS="-Dsaffron.default.charset=UTF-16LE"
------------------------------------------------------

 

+ Manually set '_UTF16' works good.
--------------------------------------------------------------------------------
0: jdbc:drill:drillbit=xx.xx.xx.xx> select columns[0] as 作業番号
. . . . . . . . . . . . . . . . . .> from test01.root.`作業予定表.tsv`
. . . . . . . . . . . . . . . . . .> where columns[2] like _UTF16'ビル名%'
. . . . . . . . . . . . . . . . . .> ;
+------------+
| 作業番号 |
+------------+
| 111111111 |
| 222222222 |
| 33333333 |
| 44444444 |
+------------+
4 rows selected (0.247 seconds)
--------------------------------------------------------------------------------


+ DRILL_SHELL_JAVA_OPTS set in /opt/mapr/drill/drill-1.10.0/conf/drill-env.sh makes error.
--------------------------------------------------------------------------------
0: jdbc:drill:drillbit=xx.xx.xx.xx> select columns[0] as 作業番号
. . . . . . . . . . . . . . . . . .> from test01.root.`作業予定表.tsv`
. . . . . . . . . . . . . . . . . .> where columns[2] like 'ビル名%'
. . . . . . . . . . . . . . . . . .> ;
Error: SYSTEM ERROR: CalciteException: Failed to encode 'ビル名%' in character set 'ISO-8859-1'


[Error Id: 8d40ca88-cfd1-42c1-b646-61a1205378f4 on hadoop01:31010] (state=,code=0)
--------------------------------------------------------------------------------

Outcomes