AnsweredAssumed Answered

Accessing data between clusters

Question asked by kusako on Feb 13, 2013
Latest reply on Feb 18, 2013 by kusako
Hi-
I trying to access data in one MapR cluster from another MapR cluster (both running the same version of MapR). The requirement is, that the map/reduce job runs on the second cluster.
I tried running a simple wordcount like this on a node in cluster2:

hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar wordcount maprfs://cldb.cluster1.net:7222/test/shakespeare/comedies maprfs://cldb.cluster2.net:7222/test/out

Unfortunately this tries to run the job on cluster1.
I also tried setting jobtracker and default file system like this:

hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar wordcount -Dmapred.job.tracker=cldb.cluster2.net:9001 -Dfs.default.name=maprfs://cldb.cluster2.net:7222/ maprfs://cldb.cluster1.net:7222/test/shakespeare/comedies maprfs://cldb.cluster2.net:7222/test/out

but this leads to a file not found exception:
13/02/14 11:42:46 ERROR ipc.RPC: FailoverProxy: Failing this Call: submitJob for error(RemoteException): org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.FileNotFoundException: Requested file maprfs:/var/mapr/cluster/mapred/jobTracker/staging/test/.staging/job_201302131405_0038/job.xml does not exist.

I'm wondering if this is could work at all, or if I'm completely off track?

Thanks for any suggestions,
-kusako

Outcomes