[nfsv4] Performance drop in iozone for reading large files

Badari Pulavarty pbadari at us.ibm.com
Fri Sep 15 12:01:19 EDT 2006


On Thu, 2006-09-14 at 23:53 -0700, Bryce Harrington wrote:
> On Thu, Sep 14, 2006 at 05:13:53PM -0700, Andrew Morton wrote:
> > On Thu, 14 Sep 2006 16:58:05 -0700
> > Bryce Harrington <bryce at osdl.org> wrote:
> > 
> > > On Wed, Sep 06, 2006 at 10:59:48AM -0400, Dean Hildebrand wrote:
> > > > To prove/disprove the sever caching theory, here are a couple *echm* fun 
> > > > things you can try:
> > > > 1) While writing and reading the file, watch the memory on the server 
> > > > using 'top'.  You should see it rise up an amount roughly equal to the 
> > > > size of the file.  Once the file is deleted by iozone, the memory will 
> > > > drop back down again.
> > > > 2) After writing the file do not delete it (use -w I believe). Umount 
> > > > the server, reboot the server, then mount the server and read the file.  
> > > > This will read the file directly from disk.  Compare this performance to 
> > > > your original read performance (which used the server cache).
> > > 
> > > I think we've satisfied ourselves that this generally describes what
> > > we're seeing.  However, there is one other odd behavior we think may be
> > > of interest.  Compare the following two charts:
> > > 
> > > 2.6.17 (stock):
> > >     http://crucible.osdl.org/runs/1959/test_output/iozone.sys.log.png
> > > 2.6.18-rc1 (stock):
> > >     http://crucible.osdl.org/runs/1955/test_output/iozone.sys.log.png
> > > 
> > > Notice how the read performance in the second kernel has fallen below
> > > write performance.  We're seeing this "read performance worse than write
> > > performance" in the recently released CITI patch as well as in the git
> > > trees.  For instance, see the 'nfsv4' iozone test cases at
> > > http://crucible.osdl.org/runs/branches.html 
> > 
> > It appears that large linear reads of large, well-laid out files on ext3
> > have taken a ~20% hit post-2.6.17.  Badari is looking into it.
> > 
> > If your server-side's filesystem is ext3 I'd suggest that you mount it with
> > ext2 and retest.  If that fixes it, it's almost certainly the ext3
> > regression.
> 
> Oh, cool, we'll give this a try.  Thanks!

Can you try with following ext3 patch ?

Thanks,
Badari

ext3-get-blocks support caused ~20% degrade in Sequential read
performance (tiobench). Problem is with marking the buffer boundary
so IO can be submitted right away.

2.6.18-rc6:
-----------
# ./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 75.2726 seconds, 57.1 MB/s

real    1m15.285s
user    0m0.276s
sys     0m3.884s


2.6.18-rc6 + fix:
-----------------
[root at elm3a241 ~]# ./iotest
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB) copied, 62.9356 seconds, 68.2 MB/s


The boundary block check in ext3_get_blocks_handle needs to be adjusted
against the count of blocks mapped in this call, now that it can map
more than one block.

Signed-off-by: Suparna Bhattacharya <suparna at in.ibm.com>
Tested-by: Badari Pulavarty <pbadari at us.ibm.com>


 linux-2.6.18-rc5-suparna/fs/ext3/inode.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN fs/ext3/inode.c~ext3-multiblock-boundary-fix fs/ext3/inode.c
--- linux-2.6.18-rc5/fs/ext3/inode.c~ext3-multiblock-boundary-fix	2006-09-15 10:53:12.000000000 +0530
+++ linux-2.6.18-rc5-suparna/fs/ext3/inode.c	2006-09-15 10:54:30.000000000 +0530
@@ -925,7 +925,7 @@ int ext3_get_blocks_handle(handle_t *han
 	set_buffer_new(bh_result);
 got_it:
 	map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key));
-	if (blocks_to_boundary == 0)
+	if (count > blocks_to_boundary)
 		set_buffer_boundary(bh_result);
 	err = count;
 	/* Clean up and exit */





More information about the NFSv4 mailing list