AnsweredAssumed Answered

Sqoop generated code unable to parse SequenceFile

Question asked by alex.corvino on Dec 2, 2014
Latest reply on Dec 4, 2014 by alex.corvino
We're running M3. I've exported some data from MySQL with Sqoop to Sequence Files and I'm unable to consume the data in my MapReduce code. My table is two columns, one INT and one VARBINARY(32762). NULLs are not allowed in either column.

My import command:

    sqoop import \
    --connect jdbc:mysql://testdbserver/testschema \
    --username me -P \
    --table testtable \
    --as-sequencefile \
    --package-name generated \
    --outdir ~/testproject/src/main/java \
    --target-dir /test/testschema/testtable

The job itself is called from an Oozie workflow which configures the it like so:

                <property>
                    <name>mapred.input.key.class</name>
                    <value>org.apache.hadoop.io.LongWritable</value>
                </property>
                <property>
                    <name>mapred.input.value.class</name>
                    <value>org.apache.hadoop.io.Text</value>
                </property>
                <property>
                    <name>mapred.mapoutput.key.class</name>
                    <value>org.apache.hadoop.io.LongWritable</value>
                </property>
                <property>
                    <name>mapred.mapoutput.value.class</name>
                    <value>generated.testtable</value>
                </property>
                <property>
                    <name>mapred.output.key.class</name>
                    <value>org.apache.hadoop.io.LongWritable</value>
                </property>

And my map method

    @Override
public void map (LongWritable inKey, Text inValue,
      OutputCollector<LongWritable, Text> output, Reporter reporter)
        throws IOException {

  testtable record = new testtable();
 
  try {
   record.parse(inValue);
  } catch (ParseError e1) {
   log.error(e1);
   e1.printStackTrace();
   return;
  }

    [...]

This bit fails on the parse. The exception that I see is:

    INFO [AsyncDispatcher event handler]   org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from   attempt_1415438784121_12334_m_000101_0: Error: java.lang.NumberFormatException: For input   string: "+�;� *� PW]�IK�Ă;�4��_���ã5�mV��>�j{� `�}u�)P���Y������ n�r��"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
  at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.valueOf(Long.java:540)
at generated.testtable.__loadFromFields(testtable.java:198)
at generated.testtable.parse(testtable.java:150)

It seems pretty clear that the data in the Text object is binary which is causing the parse to fail. My first thought is that inValue is actually a testtable object that's being passed in improperly. From what I've read if I have my mapreduce.inputformat.class set to SequenceFileInputFormat then I should be able to set the value to be an instance of testtable  but that gives me cast exceptions complaining that Text can't be cast to testtable.

This seems like it should be really straightforward but I've had absolutely no luck in figuring out this issue at all. Anyone have any clues as to what I'm doing wrong?

Outcomes