2.6.16 to 2.6.17 read performance change
Bryce Harrington
bryce at osdl.org
Tue Aug 8 13:27:18 EDT 2006
We did a bit more investigation into the cause of the big performance
jump that I presented about at OLS, that occurred between 2.6.16 and
2.6.17. It looks like there was an NFS patch submitted by Trond a long
time ago to fix the performance issue, that only recently was
incorporated into the mainline kernel.
Bryce
----- Forwarded message from Jason Neighbors <jasonn at osdl.org> -----
Date: Mon, 7 Aug 2006 09:58:16 -0700
From: Jason Neighbors <jasonn at osdl.org>
To: bryce at osdl.org
Subject: 2.6.16 to 2.6.17 read performance change
Hey,
so the change that fixed the NFS read performance for large record sizes was in mm/readahead.c. Probably the same thing Trond reported to lkml a long time ago:
http://lkml.org/lkml/2004/1/15/122
A diff between the bad and good is below:
--- readahead.c.bad 2006-08-02 16:48:14.000000000 +0000
+++ readahead.c 2006-08-01 20:28:34.000000000 +0000
@@ -52,13 +52,24 @@
return (VM_MIN_READAHEAD * 1024) / PAGE_CACHE_SIZE;
}
+static inline void reset_ahead_window(struct file_ra_state *ra)
+{
+ /*
+ * ... but preserve ahead_start + ahead_size value,
+ * see 'recheck:' label in page_cache_readahead().
+ * Note: We never use ->ahead_size as rvalue without
+ * checking ->ahead_start != 0 first.
+ */
+ ra->ahead_size += ra->ahead_start;
+ ra->ahead_start = 0;
+}
+
static inline void ra_off(struct file_ra_state *ra)
{
ra->start = 0;
ra->flags = 0;
ra->size = 0;
- ra->ahead_start = 0;
- ra->ahead_size = 0;
+ reset_ahead_window(ra);
return;
}
@@ -72,10 +83,10 @@
{
unsigned long newsize = roundup_pow_of_two(size);
- if (newsize <= max / 64)
- newsize = newsize * newsize;
+ if (newsize <= max / 32)
+ newsize = newsize * 4;
else if (newsize <= max / 4)
- newsize = max / 4;
+ newsize = newsize * 2;
else
newsize = max;
return newsize;
@@ -426,8 +437,7 @@
* congestion. The ahead window will any way be closed
* in case we failed due to excessive page cache hits.
*/
- ra->ahead_start = 0;
- ra->ahead_size = 0;
+ reset_ahead_window(ra);
}
return ret;
@@ -520,11 +530,11 @@
* If we get here we are doing sequential IO and this was not the first
* occurence (ie we have an existing window)
*/
-
if (ra->ahead_start == 0) { /* no ahead window yet */
if (!make_ahead_window(mapping, filp, ra, 0))
- goto out;
+ goto recheck;
}
+
/*
* Already have an ahead window, check if we crossed into it.
* If so, shift windows and issue a new ahead window.
@@ -536,11 +546,16 @@
ra->start = ra->ahead_start;
ra->size = ra->ahead_size;
make_ahead_window(mapping, filp, ra, 0);
+recheck:
+ /* prev_page shouldn't overrun the ahead window */
+ ra->prev_page = min(ra->prev_page,
+ ra->ahead_start + ra->ahead_size - 1);
}
out:
return ra->prev_page + 1;
}
+EXPORT_SYMBOL_GPL(page_cache_readahead);
/*
* handle_ra_miss() is called when it is known that a page which should have
--
Jason Neighbors
x1939
----- End forwarded message -----
More information about the NFSv4
mailing list