NFS Howto Optimization
From Linux NFS
(→Optimizing NFS Performance) |
(→Setting Block Size to Optimize Transfer Speeds) |
||
Line 4: | Line 4: | ||
Aside from the general network configuration - appropriate network capacity, faster NICs, full duplex settings in order to reduce collisions, agreement in network speed among the switches and hubs, etc. - one of the most important client optimization settings are the NFS data transfer buffer sizes, specified by the '''mount''' command options '''rsize''' and '''wsize'''. | Aside from the general network configuration - appropriate network capacity, faster NICs, full duplex settings in order to reduce collisions, agreement in network speed among the switches and hubs, etc. - one of the most important client optimization settings are the NFS data transfer buffer sizes, specified by the '''mount''' command options '''rsize''' and '''wsize'''. | ||
===== Setting Block Size to Optimize Transfer Speeds ===== | ===== Setting Block Size to Optimize Transfer Speeds ===== | ||
+ | The '''mount''' command options '''rsize''' and '''wsize''' specify the size of the chunks of data that the client and server pass back and forth to each other. If no '''rsize''' and '''wsize''' options are specified, the default varies by which version of NFS we are using. The most common default is 4K (4096 bytes), although for TCP-based mounts in 2.2 kernels, and for all mounts beginning with 2.4 kernels, the server specifies the default block size. | ||
+ | |||
+ | The theoretical limit for the NFS V2 protocol is 8K. For the V3 protocol, the limit is specific to the server. On the Linux server, the maximum block size is defined by the value of the kernel constant '''NFSSVC_MAXBLKSIZE''', found in the Linux kernel source file ./include/linux/nfsd/const.h. The current maximum block size for the kernel, as of 2.4.17, is 8K (8192 bytes), but the patch set implementing NFS over TCP/IP transport in the 2.4 series, as of this writing, uses a value of 32K (defined in the patch as 32*1024) for the maximum block size. | ||
+ | |||
+ | All 2.4 clients currently support up to 32K block transfer sizes, allowing the standard 32K block transfers across NFS mounts from other servers, such as Solaris, without client modification. | ||
+ | |||
+ | The defaults may be too big or too small, depending on the specific combination of hardware and kernels. On the one hand, some combinations of Linux kernels and network cards (largely on older machines) cannot handle blocks that large. On the other hand, if they can handle larger blocks, a bigger size might be faster. | ||
+ | |||
+ | You will want to experiment and find an '''rsize''' and '''wsize''' that works and is as fast as possible. You can test the speed of your options with some simple commands, if your network environment is not heavily used. Note that your results may vary widely unless you resort to using more complex benchmarks, such as Bonnie, Bonnie++, or IOzone. | ||
+ | |||
+ | The first of these commands transfers 16384 blocks of 16k each from the special file ''/dev/zero'' (which if you read it just spits out zeros ''really'' fast) to the mounted partition. We will time it to see how long it takes. So, from the client machine, type: | ||
+ | <pre> | ||
+ | # time dd if=/dev/zero of=/mnt/home/testfile bs=16k count=16384 | ||
+ | </pre> | ||
+ | This creates a 256Mb file of zeroed bytes. In general, you should create a file that's at least twice as large as the system RAM on the server, but make sure you have enough disk space! Then read back the file into the great black hole on the client machine (''/dev/null'') by typing the following: | ||
+ | <pre> | ||
+ | # time dd if=/mnt/home/testfile of=/dev/null bs=16k | ||
+ | </pre> | ||
+ | Repeat this a few times and average how long it takes. Be sure to unmount and remount the filesystem each time (both on the client and, if you are zealous, locally on the server as well), which should clear out any caches. | ||
+ | |||
+ | Then unmount, and mount again with a larger and smaller block size. They should be multiples of 1024, and not larger than the maximum block size allowed by your system. Note that NFS Version 2 is limited to a maximum of 8K, regardless of the maximum block size defined by '''NFSSVC_MAXBLKSIZE'''; Version 3 will support up to 64K, if permitted. The block size should be a power of two since most of the parameters that would constrain it (such as file system block sizes and network packet size) are also powers of two. However, some users have reported better successes with block sizes that are not powers of two but are still multiples of the file system block size and the network packet size. | ||
+ | |||
+ | Directly after mounting with a larger size, cd into the mounted file system and do things like ls, explore the filesystem a bit to make sure everything is as it should. If the '''rsize'''/'''wsize''' is too large the symptoms are very odd and not 100% obvious. A typical symptom is incomplete file lists when doing '''ls''', and no error messages, or reading files failing mysteriously with no error messages. After establishing that the given '''rsize'''/ '''wsize''' works you can do the speed tests again. Different server platforms are likely to have different optimal sizes. | ||
+ | |||
+ | Remember to edit ''/etc/fstab'' to reflect the '''rsize'''/'''wsize''' you found to be the most desirable. | ||
+ | |||
+ | If your results seem inconsistent, or doubtful, you may need to analyze your network more extensively while varying the '''rsize''' and '''wsize''' values. In that case, here are several pointers to benchmarks that may prove useful: | ||
+ | * [http://www.textuality.com/bonnie/ Bonnie] | ||
+ | * [http://www.coker.com.au/bonnie++/ Bonnie++] | ||
+ | * [http://www.iozone.org/ Iozone] | ||
+ | * [http://www.spec.org/osg/sfs97/ The official NFS benchmark, SPECsfs97] | ||
+ | The easiest benchmark with the widest coverage, including an extensive spread of file sizes, and of IO types - reads, & writes, rereads & rewrites, random access, etc. - seems to be IOzone. A recommended invocation of IOzone (for which you must have root privileges) includes unmounting and remounting the directory under test, in order to clear out the caches between tests, and including the file close time in the measurements. Assuming you've already exported /tmp to everyone from the server foo, and that you've installed IOzone in the local directory, this should work: | ||
+ | |||
===== Packet Size and Network Drivers ===== | ===== Packet Size and Network Drivers ===== | ||
===== Overflow of Fragmented Packets ===== | ===== Overflow of Fragmented Packets ===== |
Revision as of 21:02, 5 April 2006
Optimizing NFS Performance
Careful analysis of your environment, both from the client and from the server point of view, is the first step necessary for optimal NFS performance. The first sections will address issues that are generally important to the client. Later (Section 5.3 and beyond), server side issues will be discussed. In both cases, these issues will not be limited exclusively to one side or the other, but it is useful to separate the two in order to get a clearer picture of cause and effect.
Aside from the general network configuration - appropriate network capacity, faster NICs, full duplex settings in order to reduce collisions, agreement in network speed among the switches and hubs, etc. - one of the most important client optimization settings are the NFS data transfer buffer sizes, specified by the mount command options rsize and wsize.
Setting Block Size to Optimize Transfer Speeds
The mount command options rsize and wsize specify the size of the chunks of data that the client and server pass back and forth to each other. If no rsize and wsize options are specified, the default varies by which version of NFS we are using. The most common default is 4K (4096 bytes), although for TCP-based mounts in 2.2 kernels, and for all mounts beginning with 2.4 kernels, the server specifies the default block size.
The theoretical limit for the NFS V2 protocol is 8K. For the V3 protocol, the limit is specific to the server. On the Linux server, the maximum block size is defined by the value of the kernel constant NFSSVC_MAXBLKSIZE, found in the Linux kernel source file ./include/linux/nfsd/const.h. The current maximum block size for the kernel, as of 2.4.17, is 8K (8192 bytes), but the patch set implementing NFS over TCP/IP transport in the 2.4 series, as of this writing, uses a value of 32K (defined in the patch as 32*1024) for the maximum block size.
All 2.4 clients currently support up to 32K block transfer sizes, allowing the standard 32K block transfers across NFS mounts from other servers, such as Solaris, without client modification.
The defaults may be too big or too small, depending on the specific combination of hardware and kernels. On the one hand, some combinations of Linux kernels and network cards (largely on older machines) cannot handle blocks that large. On the other hand, if they can handle larger blocks, a bigger size might be faster.
You will want to experiment and find an rsize and wsize that works and is as fast as possible. You can test the speed of your options with some simple commands, if your network environment is not heavily used. Note that your results may vary widely unless you resort to using more complex benchmarks, such as Bonnie, Bonnie++, or IOzone.
The first of these commands transfers 16384 blocks of 16k each from the special file /dev/zero (which if you read it just spits out zeros really fast) to the mounted partition. We will time it to see how long it takes. So, from the client machine, type:
# time dd if=/dev/zero of=/mnt/home/testfile bs=16k count=16384
This creates a 256Mb file of zeroed bytes. In general, you should create a file that's at least twice as large as the system RAM on the server, but make sure you have enough disk space! Then read back the file into the great black hole on the client machine (/dev/null) by typing the following:
# time dd if=/mnt/home/testfile of=/dev/null bs=16k
Repeat this a few times and average how long it takes. Be sure to unmount and remount the filesystem each time (both on the client and, if you are zealous, locally on the server as well), which should clear out any caches.
Then unmount, and mount again with a larger and smaller block size. They should be multiples of 1024, and not larger than the maximum block size allowed by your system. Note that NFS Version 2 is limited to a maximum of 8K, regardless of the maximum block size defined by NFSSVC_MAXBLKSIZE; Version 3 will support up to 64K, if permitted. The block size should be a power of two since most of the parameters that would constrain it (such as file system block sizes and network packet size) are also powers of two. However, some users have reported better successes with block sizes that are not powers of two but are still multiples of the file system block size and the network packet size.
Directly after mounting with a larger size, cd into the mounted file system and do things like ls, explore the filesystem a bit to make sure everything is as it should. If the rsize/wsize is too large the symptoms are very odd and not 100% obvious. A typical symptom is incomplete file lists when doing ls, and no error messages, or reading files failing mysteriously with no error messages. After establishing that the given rsize/ wsize works you can do the speed tests again. Different server platforms are likely to have different optimal sizes.
Remember to edit /etc/fstab to reflect the rsize/wsize you found to be the most desirable.
If your results seem inconsistent, or doubtful, you may need to analyze your network more extensively while varying the rsize and wsize values. In that case, here are several pointers to benchmarks that may prove useful:
The easiest benchmark with the widest coverage, including an extensive spread of file sizes, and of IO types - reads, & writes, rereads & rewrites, random access, etc. - seems to be IOzone. A recommended invocation of IOzone (for which you must have root privileges) includes unmounting and remounting the directory under test, in order to clear out the caches between tests, and including the file close time in the measurements. Assuming you've already exported /tmp to everyone from the server foo, and that you've installed IOzone in the local directory, this should work: