rapid clustered nfs server failover and hung clients -- how best to close the sockets?

Jeff Layton jlayton at redhat.com
Mon Jun 9 12:01:10 EDT 2008


On Mon, 09 Jun 2008 11:51:51 -0400
"Talpey, Thomas" <Thomas.Talpey at netapp.com> wrote:

> At 11:18 AM 6/9/2008, Jeff Layton wrote:
> >No, it's not specific to NFS. It can happen to any "service" that
> >floats IP addresses between machines, but does not close the sockets
> >that are connected to those addresses. Most services that fail over
> >(at least in RH's cluster server) shut down the daemons on failover
> >too, so tends to mitigate this problem elsewhere.
> 
> Why exactly don't you choose to restart the nfsd's (and lockd's) on the
> victim server?

The victim server might have other nfsd/lockd's running on them. Stopping
all the nfsd's could bring down lockd, and then you have to deal with lock
recovery on the stuff that isn't moving to the other server.

> Failing that, for TCP at least would ifdown/ifup accomplish
> the socket reset?
> 

I don't think ifdown/ifup closes the sockets, but maybe someone can
correct me on this...

-- 
Jeff Layton <jlayton at redhat.com>


More information about the NFSv4 mailing list