How to load a text file into an ORC Hive table

Document created by Hao Zhu Employee on Feb 17, 2016
Version 1Show Document
  • View in full screen mode

Author: Hao Zhu

Original Publication Date: December 3, 2014

 

A text file can not be loaded into an ORC Hive table directly because the "load data ... into" command simply copies the input file(s) to the Hive data file. The file should be in the ORC file format to load it into an ORC Hive table.

 

Currently Hive does not validate the storage format when you run "load data into", which means if you accidentally load a plain text file into a ORC hive table, the below error messages will be seen:

 

CREATE TABLE IF NOT EXISTS orctest (

id string,

id2 string,

id3 string,

id4 string

)

STORED AS ORC;

 

load data local inpath "/opt/tmp/testload2.txt" into table orctest;

 

hive> select * from orctest limit 1;

OK

Failed with exception java.io.IOException:java.lang.RuntimeException: serious problem

Time taken: 0.279 seconds

The correct way is to first load into a intermediate Hive table stored with text format and then insert into the ORC Hive table.

For example:

CREATE TABLE IF NOT EXISTS orctest_text (

id string,

id2 string,

id3 string,

id4 string

)

STORED AS TEXTFILE;

 

load data local inpath "/opt/tmp/testload2.txt" into table orctest_text;

 

INSERT OVERWRITE TABLE orctest SELECT * FROM orctest_text;

Attachments

    Outcomes