AnsweredAssumed Answered

Questions about Distcp between Two clusters

Question asked by rudraram on Jan 16, 2014
Latest reply on Jun 18, 2014 by Ted Dunning
**Scenario**: We have Dev Environment and recently setup staging environment and want to move few of our volumes (10-25TB) of data from Dev to Stage

we thought of Three Options
1) Since both the clusters are NFS mounted to respective edge nodes, we can perform an Rsync between the edgenodes but this seems to effect other users who are connected to the edge node and the whole rsync process is slow as well

Dev Cluster --> Dev Edge Node --> Stage Edge Node --> Stage Cluster

2) Mirroring : We can set mirroring but the second cluster will be readonly

3) Performing Distcp: Distcp seems to effective option here but opening ACL's/Ports is a concern as we have to adhere Security standards

my question is if I am running my Distcp command on Dev cluster

like hadoop distcp devSrc stageDest

what would be the ports that needed to be opened between Dev cluster and Staging Cluster

like JT, MFS, CLDB etc??