System lockup with recent pull from Bruce's git tree

J. Bruce Fields bfields at fieldses.org
Thu Sep 7 11:01:46 EDT 2006


On Wed, Sep 06, 2006 at 01:58:36PM -0700, Frank Filz wrote:
> I have been working on trying to eliminate the BKL from the NFS client.
> I was starting to think there is a problem with some asynch RPC calls,
> so I decided to pull Bruce's current git tree a couple days ago (2.6.18-
> rc5+CITI_NFS4_ALL-1) and run my test on that tree and see if I was
> seeing the problem. What I'm seeing is worse. Before, what I was seeing
> was an NFS v4 operation would wait for a response forever, and other
> threads that accessed the same inode would then hang (waiting for the
> inode mutex).

Do you know what operation it was that the server wasn't responding to,
and how to reproduce the problem?

> With the latest code (and definitely no changes of my
> own), my system is completely locking up.
> 
> The testing I am running is:
> 
> Mounts:
> 
> mount -t nfs -o hard,intr localhost:/home /mnt
> mount -t nfs4 -o hard,intr localhost:/ /mnt2

Note that the client and server are not designed to be deadlock-free
when they're on the same machine.  (The client may not be able to write
out dirty pages until the server gets some memory it needs to handle the
requests, and the server may not be able to get that memory until the
client can free those dirty pages.)  I run connectathon test and such
regularly over localhost mounts, but those don't touch a lot of data.

So can you reproduce this with client and server on separate machines?

--b.


More information about the NFSv4 mailing list