AnsweredAssumed Answered

NFS HA in AWS - Is it possible?

Question asked by peterjenks on Feb 11, 2016
Latest reply on Feb 11, 2016 by peterjenks
Has anyone managed to get an HA solution for NFS on a cluster running in AWS?

I have tested a number of options but I don't think I can get it to work *without* additional scripting to modify AWS.

Here are the options I have tried and the issues

**HA via AWS**

This option involves running all of the NFS Gateway nodes through an AWS Elastic Load Balancer (ELB) with a DNS CNAME entry in front of the ELB.

This worked initially. However occassionally an NFS mount would hang. We think that it is because because all of the nodes in the ELB are active, and it would allocate different requests to different nodes.

**HA via MapR**

This option involves using the HA provided by MapR. You provide a VIP and this is assigned to one of the Gateway nodes. If that node goes offline then the VIP is automatically reassigned to another node.

The problem with this is how to automatically assign the VIP to the instance at the AWS level. For any call from a server to an IP address, AWS needs to know which instance this is assigned to, and therefore which Security Groups need to be evaluated before allowing the traffic.

How can you tell AWS which instance the cluster has assigned the VIP to, and how can you shift it to another instance if that node goes offline.

I could write a script to do this with the MapR CLI and the AWS CLI, and then checking it all the time. This however seems like a poor solution.

**HA via the Client**

This option would involve installing the MapR POSIX client on each of the servers that I want to mount the MapR FS to. It avoids using the NFS protocol and allows the client to transfer data to all of the NFS Gateway nodes.

However I am unsure if this is an acceptable HA solution. How does the client behave if one of the nodes goes offline?  Also, its not a viable solution for any Windows servers that I want to mount from.

This is the option I am currently testing. I can get the mount working okay, but I am getting a flood of spurious errors in the POSIX client log about "Resource being unavailable". Hmmm.

**HA via DNS**

Another option (that is possible but I'm not so keen on) is using a DNS CNAME entry to point to a single node running NFS Gateway. Then if the node goes down you repoint the CNAME to another node.

This would involve writing a script to do this in Powershell with the MapR CLI.

It's not a great solution but it is possible, so I've mentioned it here for completeness.

**Finally**

I would appreciate any insight from others who have worked through this issue. Surely I'm not the first to try this :)

Happy to take advice .

P.

Outcomes