[nfsv4] NFSv4 performance issues on a lagged network

Dipankar Sarkar doom2quake at gmail.com
Sat Sep 9 01:47:28 EDT 2006


Hey Trond

I ran the test again
http://crucible.osdl.org/runs/1982/test_output/iozone.sys.log.png
with about 50 seconds of packet delay, after which it is switched off , it
works great ..... as we can see in the outputs

I changed the tcp slot entries to 64 and again ran the previous full packet
delay test (where i do not switch the delay off). and again it seems it
works great (i mean does not take iozone into a loop) .... This ouput shows
the performance hit because of the packet delay

http://crucible.osdl.org/runs/1983/test_output/iozone.sys.log.png
http://crucible.osdl.org/runs/1983/

Dipankar

On 9/9/06, Trond Myklebust <trond.myklebust at fys.uio.no> wrote:
>
> On Fri, 2006-09-08 at 18:29 -0700, Bryce Harrington wrote:
> > On Wed, Aug 30, 2006 at 05:01:30PM -0400, Trond Myklebust wrote:
> > > On Tue, 2006-08-29 at 17:07 -0400, Trond Myklebust wrote:
> > > > See http://crucible.osdl.org/runs/1621/sysinfo/nfs05.console
> > > >
> > > > It looks like the networking layer is passing back an ENETUNREACH
> error
> > > > that we weren't expecting, and consequently weren't handling.
> > > >
> > > > Hmm... Not much you can do in those circumstances except delay and
> then
> > > > retry. I suppose the same goes for EHOSTUNREACH (which we didn't
> > > > receive, but could conceivably happen too).
> > > >
> > > > I'll look into drafting a patch.
> > >
> > > OK... Please could you see if the attached patch has an effect on
> those
> > > errors.
> > >
> > > Cheers,
> > >  Trond
> >
> > Hi Trond,
> >
> > Thanks for the patch.  This has the effect of causing iozone to go into
> > a loop and fail to finish.  Here are three runs on this patch with
> > identical conditions:
> >
> >    http://crucible.osdl.org/runs/1791/test_output/iozone.sys.log
> >    http://crucible.osdl.org/runs/1793/test_output/iozone.sys.log
> >    http://crucible.osdl.org/runs/1794/test_output/iozone.sys.log
> >
> > All three runs are getting stuck at almost exactly the same spot in the
> > test.
> >
> > On the console, the output is:
> >
> > [http://crucible.osdl.org/runs/1793/sysinfo/nfs05.console]
> > ...
> > nfs05 login: nfsd: last server has exited
> > nfsd: unexporting all filesystems
> > NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery
> > directory
> > NFSD: starting 90-second grace period
> > *** Run 1793: Running 'src/current/iozone -+q 1 -i 0 -i 1 -g 128M -Race
> > -U /mnt/192.168.10.4 -f /mnt/192.168.10.4/iozone.tmp' ***
> > nfs: server 192.168.10.4 not responding, still trying
> > [-- MARK -- Tue Sep  5 23:00:00 2006]
> > [-- MARK -- Wed Sep  6 00:00:00 2006]
> > ...
> >
> > Ideas on what to try next?
>
> What happens if you turn off your packet delay mechanism _after_ you
> start seeing the above hang?
> When we start seeing ENETUNREACH errors, then that is a sign that the
> networking layer is having trouble actually routing the packets. My
> guess is that you have tuned the packet delay to the point where the
> networking layer is spending all its time trying to establish contact
> with the router/gateway ('cos it isn't able to talk to anything on the
> network). If you turn off the delay, then presumably the system will
> recover.
>
> Note also, that you might want to try changing the setup so that the
> delay happens between two routers instead of between the client and its
> primary network.
>
> Cheers,
>   Trond
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/nfsv4/attachments/20060909/a889da7a/attachment.htm


More information about the NFSv4 mailing list