AnsweredAssumed Answered

PigServer API + MapR problems:  Any known issues or working examples?

Question asked by zackurey on Jan 26, 2012
Latest reply on Jan 26, 2012 by Ted Dunning
I'm able to successfully get the pig client(grunt shell) to run fine against a MapR cluster.  But am now attempting to use pig programmatically from within an application using the PigServer API, but running into an error:

I've tried both the mapr distributed pig jar(0.9.0), and the apache distribution(0.9.1).  Both end up with the same error:
<code>
java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:118)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:123)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:1807)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:67)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:285)
at org.apache.pig.PigServer.compilePp(PigServer.java:1373)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1310)
at org.apache.pig.PigServer.execute(PigServer.java:1299)
at org.apache.pig.PigServer.executeBatch(PigServer.java:359)
</code>

I've dug around and it appears to just choke on the value provided for mapred.job.tracker in mapred-site.xml: 'maprfs:///'.  I've also tried 'maprfs:///mapr/my.cluster.com'(default). 

On the code side I have something like:
<code>
PigServer pigServer = new PigServer(new PigContext());
pigServer.setBatchOn();
...
pigServer.registerScript( someStream );
pigServer.executeBatch();
</code>

Which appears to find all the hadoop and mapr config files on the classpath fine, but just doesn't do the right thing.  Again putting all the same config files on the classpath, and the right mapr jars gets pig to work stand-alone.

Does anything stand out here as obviously incorrect, or are there any known issues around using the embedded pig api with MapR? 

Outcomes