AnsweredAssumed Answered

Pig Distributed Cache on EMR running m3

Question asked by colinbritton on Feb 14, 2013
Latest reply on Feb 16, 2013 by Ted Dunning
I have a UDF that uses the MaxMind database to decode IP addresses and I cannot seem to be able to get the data file to be found when I try to load it.

My pig script includes the line:
DEFINE IPADDRESSEVAL LookupServiceUDF('s3://bucket.mine.com/bin/geolitecity.dat#geolitecity.dat');

My UDF has the following constuctor...
public LookupServiceUDF(String file){
if (file != null){
list.add(file);
}
}

and method....
public List<String> getCacheFiles() {
return list;
}

I try to load the file within the Eval method using...

File geoIPData = new File("./geolitecity.dat");

and I get a file not found exception. I added debug and it does not seem to be getting copied to the local file system. I have tried this with M3 and M5 on Amazon Elastic Mapreduce.

CB

Outcomes