AnsweredAssumed Answered

Copying data to nodes, how to spread equally

Question asked by sjgx on Sep 26, 2016
Latest reply on Oct 20, 2016 by Ted Dunning

I have a large set of data, split over hundreds of files, that I am copying over to my cluster vis ssh. I have 9 nodes, all but one setup as data nodes and the remaining node (hadoop-node2) has most of the MapR processes running on it. On hadoop-node2 I run

hadoop dfs -mkdir /data

and then copy over my data via

 ssh username@hadoop-node2 "hadoop dfs -put - /data/myData"

I monitor the space used on my cluster via the MapR web interface and see that only hadoop-node2 is filling up. 

 

I've just switched to MapR from Cloudera (so I apologize for my ignorance) where I didn't have to worry about where my data was being stored. Is this not the case with MapR? Do I have to manually pick which node my data is being copied to? Or am I doing something completely wrong?

 

Outcomes