2.6.16 to 2.6.17 read performance change

Bryce Harrington bryce at osdl.org
Tue Aug 8 13:27:18 EDT 2006


We did a bit more investigation into the cause of the big performance
jump that I presented about at OLS, that occurred between 2.6.16 and
2.6.17.  It looks like there was an NFS patch submitted by Trond a long
time ago to fix the performance issue, that only recently was
incorporated into the mainline kernel.

Bryce

----- Forwarded message from Jason Neighbors <jasonn at osdl.org> -----

Date: Mon, 7 Aug 2006 09:58:16 -0700
From: Jason Neighbors <jasonn at osdl.org>
To: bryce at osdl.org
Subject: 2.6.16 to 2.6.17 read performance change

Hey, 
so the change that fixed the NFS read performance for large record sizes was in mm/readahead.c.  Probably the same thing Trond reported to lkml a long time ago:
http://lkml.org/lkml/2004/1/15/122

A diff between the bad and good is below:


--- readahead.c.bad     2006-08-02 16:48:14.000000000 +0000
+++ readahead.c         2006-08-01 20:28:34.000000000 +0000
@@ -52,13 +52,24 @@
        return (VM_MIN_READAHEAD * 1024) / PAGE_CACHE_SIZE;
 }

+static inline void reset_ahead_window(struct file_ra_state *ra)
+{
+       /*
+        * ... but preserve ahead_start + ahead_size value,
+        * see 'recheck:' label in page_cache_readahead().
+        * Note: We never use ->ahead_size as rvalue without
+        * checking ->ahead_start != 0 first.
+        */
+       ra->ahead_size += ra->ahead_start;
+       ra->ahead_start = 0;
+}
+
 static inline void ra_off(struct file_ra_state *ra)
 {
        ra->start = 0;
        ra->flags = 0;
        ra->size = 0;
-       ra->ahead_start = 0;
-       ra->ahead_size = 0;
+       reset_ahead_window(ra);
        return;
 }

@@ -72,10 +83,10 @@
 {
        unsigned long newsize = roundup_pow_of_two(size);

-       if (newsize <= max / 64)
-               newsize = newsize * newsize;
+       if (newsize <= max / 32)
+               newsize = newsize * 4;
        else if (newsize <= max / 4)
-               newsize = max / 4;
+               newsize = newsize * 2;
        else
                newsize = max;
        return newsize;
@@ -426,8 +437,7 @@
                 * congestion.  The ahead window will any way be closed
                 * in case we failed due to excessive page cache hits.
                 */
-               ra->ahead_start = 0;
-               ra->ahead_size = 0;
+               reset_ahead_window(ra);
        }

        return ret;
@@ -520,11 +530,11 @@
         * If we get here we are doing sequential IO and this was not the first
         * occurence (ie we have an existing window)
         */
-
        if (ra->ahead_start == 0) {      /* no ahead window yet */
                if (!make_ahead_window(mapping, filp, ra, 0))
-                       goto out;
+                       goto recheck;
        }
+
        /*
         * Already have an ahead window, check if we crossed into it.
         * If so, shift windows and issue a new ahead window.
@@ -536,11 +546,16 @@
                ra->start = ra->ahead_start;
                ra->size = ra->ahead_size;
                make_ahead_window(mapping, filp, ra, 0);
+recheck:
+               /* prev_page shouldn't overrun the ahead window */
+               ra->prev_page = min(ra->prev_page,
+                       ra->ahead_start + ra->ahead_size - 1);
        }

 out:
        return ra->prev_page + 1;
 }
+EXPORT_SYMBOL_GPL(page_cache_readahead);

 /*
  * handle_ra_miss() is called when it is known that a page which should have


-- 
Jason Neighbors
x1939

----- End forwarded message -----


More information about the NFSv4 mailing list