rapid clustered nfs server failover and hung clients -- how best to close the sockets?
Neil Horman
nhorman at redhat.com
Mon Jun 9 12:03:39 EDT 2008
On Mon, Jun 09, 2008 at 12:01:10PM -0400, Jeff Layton wrote:
> On Mon, 09 Jun 2008 11:51:51 -0400
> "Talpey, Thomas" <Thomas.Talpey at netapp.com> wrote:
>
> > At 11:18 AM 6/9/2008, Jeff Layton wrote:
> > >No, it's not specific to NFS. It can happen to any "service" that
> > >floats IP addresses between machines, but does not close the sockets
> > >that are connected to those addresses. Most services that fail over
> > >(at least in RH's cluster server) shut down the daemons on failover
> > >too, so tends to mitigate this problem elsewhere.
> >
> > Why exactly don't you choose to restart the nfsd's (and lockd's) on the
> > victim server?
>
> The victim server might have other nfsd/lockd's running on them. Stopping
> all the nfsd's could bring down lockd, and then you have to deal with lock
> recovery on the stuff that isn't moving to the other server.
>
> > Failing that, for TCP at least would ifdown/ifup accomplish
> > the socket reset?
> >
>
> I don't think ifdown/ifup closes the sockets, but maybe someone can
> correct me on this...
>
if up/down doesn't do anything to the sockets per-se, but could have any number
of side effects depending how other aspects of your network/application are
configured. Certainly not a reliable way to destroy a connection.
Neil
> --
> Jeff Layton <jlayton at redhat.com>
--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*nhorman at redhat.com
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/
More information about the NFSv4
mailing list