AnsweredAssumed Answered

Error while loading data on Solr with Hive

Question asked by Arunav on Feb 21, 2018
Latest reply on Feb 21, 2018 by cathy

Hi,

I've a 3 node MapR cluster with MapR 5.2.0, MEP 3.0.0. I've Solr version 7.2.1 installed. I've made the solr installation as Solr cloud and I can create collection and upload data from csv file through post. Can also query on the data using the Solr web API.

My requirement is to index Hive table data to Solr. I've downloaded the SerDe jar file and followed the document https://doc.lucidworks.com/fusion/2.4/Importing_Data/Import-via-Hive.html for importing hive data.

I've the lucidworks-hive-serde-2.2.7.jar file added to hive libs and user path.

Here, the Solr collection 'NewCol3' is preexisting.

 

hive> add jar lucidworks-hive-serde-2.2.7.jar;
Added [lucidworks-hive-serde-2.2.7.jar] to class path
Added resources: [lucidworks-hive-serde-2.2.7.jar]

 

hive> CREATE EXTERNAL TABLE solr_test24 (name string) STORED BY 'com.lucidworks.hadoop.hive.LWStorageHandler' LO CATION '/user/user_name/solr_test2' TBLPROPERTIES('solr.server.url' = 'http://10.52.192.123:8983/solr','solr.collection' = 'NewCol3','solr.query' = '*:*');
OK
Time taken: 0.273 seconds

 

hive> describe hive_test1.hive_test_table1;
OK
name string
Time taken: 0.055 seconds, Fetched: 1 row(s)

 

hive> insert into solr_test24 select * from hive_test1.hive_test_table1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a di fferent execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = i98779_20180221085529_e3516259-99d0-4f5c-97ba-17232a9da9ff
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1516288787616_0062, Tracking URL = http://JDERNDPRD8:8088/proxy/application_1516288787616_006 2/
Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job -kill job_1516288787616_0062
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-02-21 08:55:38,921 Stage-1 map = 0%, reduce = 0%
2018-02-21 08:56:11,106 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1516288787616_0062 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1516288787616_0062_m_000000 (and more) from job job_1516288787616_0062

Task with the most failures(4):
-----
Task ID:
task_1516288787616_0062_m_000000

-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while pr ocessing row {"name":"sagar"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:458)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"name":"sagar"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:499)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:785)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:149)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
... 9 more
Caused by: java.io.IOException
at com.lucidworks.hadoop.io.LucidWorksWriter.makeIOException(LucidWorksWriter.java:282)
at com.lucidworks.hadoop.io.LucidWorksWriter.maybeRetry(LucidWorksWriter.java:213)
at com.lucidworks.hadoop.io.LucidWorksWriter.maybeRetry(LucidWorksWriter.java:203)
at com.lucidworks.hadoop.io.LucidWorksWriter.write(LucidWorksWriter.java:198)
at com.lucidworks.hadoop.hive.LWHiveOutputFormat$1.write(LWHiveOutputFormat.java:39)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:751)
... 15 more
Caused by: java.lang.NullPointerException
at com.lucidworks.hadoop.io.impl.LWSolrDocument.getId(LWSolrDocument.java:46)
at com.lucidworks.hadoop.io.LucidWorksWriter.write(LucidWorksWriter.java:190)
... 17 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 MAPRFS Read: 0 MAPRFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

I checked for this error on this Hortonworks community: Hortonworks_Community and they have suggested to include the below parameters. But this doesn't solve the issue.

 

set hive.vectorized.execution.enabled=false;

set hive.vectorized.execution.reduce.enabled=false;

Will really appreciate any help or suggestion on this error.

 

Thanks,

Arunav

Outcomes