[nfsv4] NFSv4 performance issues on a lagged network

Bryce Harrington bryce at osdl.org
Fri Sep 8 21:29:25 EDT 2006


On Wed, Aug 30, 2006 at 05:01:30PM -0400, Trond Myklebust wrote:
> On Tue, 2006-08-29 at 17:07 -0400, Trond Myklebust wrote:
> > See http://crucible.osdl.org/runs/1621/sysinfo/nfs05.console
> > 
> > It looks like the networking layer is passing back an ENETUNREACH error
> > that we weren't expecting, and consequently weren't handling.
> > 
> > Hmm... Not much you can do in those circumstances except delay and then
> > retry. I suppose the same goes for EHOSTUNREACH (which we didn't
> > receive, but could conceivably happen too).
> > 
> > I'll look into drafting a patch.
> 
> OK... Please could you see if the attached patch has an effect on those
> errors.
> 
> Cheers,
>  Trond

Hi Trond,

Thanks for the patch.  This has the effect of causing iozone to go into
a loop and fail to finish.  Here are three runs on this patch with
identical conditions:

   http://crucible.osdl.org/runs/1791/test_output/iozone.sys.log
   http://crucible.osdl.org/runs/1793/test_output/iozone.sys.log
   http://crucible.osdl.org/runs/1794/test_output/iozone.sys.log

All three runs are getting stuck at almost exactly the same spot in the
test.

On the console, the output is:

[http://crucible.osdl.org/runs/1793/sysinfo/nfs05.console]
...
nfs05 login: nfsd: last server has exited
nfsd: unexporting all filesystems
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery
directory
NFSD: starting 90-second grace period
*** Run 1793: Running 'src/current/iozone -+q 1 -i 0 -i 1 -g 128M -Race
-U /mnt/192.168.10.4 -f /mnt/192.168.10.4/iozone.tmp' ***
nfs: server 192.168.10.4 not responding, still trying
[-- MARK -- Tue Sep  5 23:00:00 2006]
[-- MARK -- Wed Sep  6 00:00:00 2006]
...

Ideas on what to try next?

Bryce


More information about the NFSv4 mailing list