AnsweredAssumed Answered

"File too large" on MapR-DB scan from Pig

Question asked by imichaeldotorg on Nov 14, 2014
Latest reply on Nov 14, 2014 by nabeel
I'm running a Pig script that queries data from a MapR-DB table.  I get an error "File too large" when scanning the MapR-DB table.  When I run the same pig script on a traditional HBase table, the scan works fine.  We're using MapR-DB on 4.0.1.27334.GA.

The Pig code that fails is pretty straightforward:
<code>
A = LOAD '/user/mapr/table_name_here' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('my_cf:*', '-loadKey true')  AS (the_id:chararray, my_cf:map[]);
</code>

The error that is thrown (both in YARN and MR1) is below:

<pre>
Error: java.io.IOException: Scan Error: File too large(27) at
com.mapr.fs.Inode.scanNext(Inode.java:1457) at
com.mapr.fs.ResultScannerImpl.next(ResultScannerImpl.java:28) at
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:221) at
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:135) at
org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat$HBaseTableRecordReader.nextKeyValue(HBaseTableInputFormat.java:162) at
org.apache.pig.backend.hadoop.hbase.HBaseStorage.getNext(HBaseStorage.java:589) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:542) at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1469) at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Container killed by the ApplicationMaster.
Container killed on request.
Exit code is 143 Container exited with a non-zero exit code 143
</pre>

Other Pig scripts work on other MapR-DB tables in the cluster.  There are approximately 1.6M rows in the table.

Outcomes