rapid clustered nfs server failover and hung clients -- how best to close the sockets?

Talpey, Thomas Thomas.Talpey at netapp.com
Mon Jun 9 13:51:05 EDT 2008


At 01:24 PM 6/9/2008, Jeff Layton wrote:
>
>"Be sure to wait for X minutes between failovers"

At least one grace period.

>
>...wouldn't instill me with a lot of confidence. We'd have to have
>some sort of mechanism to enforce this, and that would be less than
>ideal.
>
>IMO, the ideal thing would be to make sure that the "old" server is
>ready to pick up the service again as soon as possible after the service
>leaves it.

A great goal, but it seems to me you've bundled a lot of other
incompatible requirements along with it. Having some services
restart and not others, for example. And mixing transparent IP
address takeover with stateful recovery such as TCP reconnect
and NSM/NLM. NSM provides only notification, there's no way for
either server to know for sure all the clients have completed
either switch-to or switch-back.

Of course, you could switch to UDP-only, that would fix the
TCP issue. But it won't fix NSM/NLM.

Tom.



More information about the NFSv4 mailing list