AnsweredAssumed Answered

Geographically disperse 5 nodes cluster

Question asked by rhinomike on Mar 4, 2015
Latest reply on Mar 4, 2015 by leonclayton
I am currently considering using a small Yarn Only M5 cluster to act as an NFS head for medium term log storage.

Because my computational needs at this stage are not that significant, my idea is to avoid overspec'ing the cluster (e.g. 2 x 5 nodes) and instead focus on getting 2 to 3 nodes in each of my DCs (Primary & DR) and let MR5 do its magic around replication.

The idea would be something like:

 * site1-node-01 - CLDB, ZK, NFS, Fileserver, ResourceManager, HistoryServer, Web, HBase Master, NodeManager
 * site1-node-02 - ZK, Fileserver, NodeManager, HBase Region, NFS
 * site1-node-03 - ZK, Fileserver, NodeManager, HBase Region, NFS
 * site2-node-01 - CLDB, ZK, NFS, Fileserver, ResourceManager, HistoryServer, Web, HBase Master
 * site2-node-02 - ZK, Fileserver, NodeManager, HBase Region, NFS

Topology would be something like:

 * /data/site1
    * /data/site1/site1-node-01
    * /data/site1/site1-node-02
    * /data/site1/site1-node-03
 * /data/site2
    * /data/site2/site2-node-01
    * /data/site2/site2-node-02

And disaster scenarios are:

 * Loss of node-01 on a particular site;
 * Loss of an NFS node of a particular site;
 * Loss of all site nodes
    * Site is down but NFS consumers are still up;
    * Lost nodes' data is still available at remaining site;
 * Full loss of a site
    * nodes and consumers are down
    * data for site must still be available on the remaining site);

Bandwidth is generally ok between the sites, however, latency is affected by the distance exceeding 150Km (although it is less than 600Km...)

**Would a deployment like that work?**











Outcomes