rapid clustered nfs server failover and hung clients -- how best to close the sockets?
Jeff Layton
jlayton at redhat.com
Mon Jun 9 12:01:10 EDT 2008
On Mon, 09 Jun 2008 11:51:51 -0400
"Talpey, Thomas" <Thomas.Talpey at netapp.com> wrote:
> At 11:18 AM 6/9/2008, Jeff Layton wrote:
> >No, it's not specific to NFS. It can happen to any "service" that
> >floats IP addresses between machines, but does not close the sockets
> >that are connected to those addresses. Most services that fail over
> >(at least in RH's cluster server) shut down the daemons on failover
> >too, so tends to mitigate this problem elsewhere.
>
> Why exactly don't you choose to restart the nfsd's (and lockd's) on the
> victim server?
The victim server might have other nfsd/lockd's running on them. Stopping
all the nfsd's could bring down lockd, and then you have to deal with lock
recovery on the stuff that isn't moving to the other server.
> Failing that, for TCP at least would ifdown/ifup accomplish
> the socket reset?
>
I don't think ifdown/ifup closes the sockets, but maybe someone can
correct me on this...
--
Jeff Layton <jlayton at redhat.com>
More information about the NFSv4
mailing list