NFS client patches for Linux 2.6.21

The following set of patches fix known issues with the 2.6.21 NFS client code, and significantly enhance the support for NFSv4.

linux-2.6.21-001-fix_nfs_statfs.dif:

From: Amnon Aaronsohn <amnonaar@gmail.com>

Date: Mon, 09 Apr 2007 22:05:26 -0700

NFS: statfs error-handling fix

The nfs statfs function returns a success code on error, and fills the output buffer with invalid values. The attached patch makes it return a correct error code instead.

Signed-off-by: Amnon Aaronsohn <amnonaar@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> (Modified patch to reinstate the dprintk())

linux-2.6.21-002-no_congestion_wait_in_update_request.dif:

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Date: Fri, 6 Apr 2007 13:12:46 -0400

NFS: Don't wait for congestion in nfs_update_request()

It is redundant, and will interfere with the call to balance_dirty_pages_ratelimited_nr in generic_file_write().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-003-cleanup_coalesce.dif:

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Date: Mon, 2 Apr 2007 18:48:28 -0400

NFS: Cleanup the coalescing code

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-004-cleanup_coalesce2.dif:

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Date: Mon, 2 Apr 2007 18:48:28 -0400

NFS: Another cleanup of the read/write request coalescing code

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-005-cleanup_readpages.dif:

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Date: Mon, 2 Apr 2007 18:48:28 -0400

NFS: Cleanup for nfs_readpages()

Do the coalescing of read requests into block sized requests at start of I/O as we scan through the pages instead of going through a second pass.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-006-fix_dirtying_race.dif:

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Date: Mon, 2 Apr 2007 19:29:52 -0400

NFS: Fix a race when doing NFS write coalescing

Currently we do write coalescing in a very inefficient manner: one pass in generic_writepages() in order to lock the pages for writing, then one pass in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather the locked pages for coalescing into RPC requests of size "wsize".

In fact, it turns out there is actually a deadlock possible here since we only start I/O on the second pass. If the user signals the process while we're in nfs_sync_mapping_wait(), for instance, then we may exit before starting I/O on all the requests that have been queued up.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-007-fix_page_overflow.dif:

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Date: Tue, 10 Apr 2007 09:26:35 -0400

NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-008-cleanup_sync_mapping_wait.dif:

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Date: Tue, 17 Apr 2007 17:22:13 -0400

NFS: Clean up nfs_sync_mapping_wait()

It has no business touching wbc->pages_skipped.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-009-use_pgoff_t.dif:

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Date: Tue, 17 Apr 2007 17:22:13 -0400

NFS: Use pgoff_t in structures and functions that pass page cache offsets

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-010-shrink_the_maximum_request_size_of_nlm4_requests.dif:

From: Chuck Lever <chuck.lever@oracle.com>

Date: Thu, 29 Mar 2007 16:47:47 -0400

NLM: Shrink the maximum request size of NLM4 requests

NLM version 4 requests estimate the call and reply header sizes rather conservatively, using the very maximum size allowed in the protocol even though Linux always uses only a small fraction of the allowable space.

Reduce the size of caller and lock arguments to conserve RPC buffer space while XDR encoding NLM4 arguments. Add compile-time checks to ensure the hostname string won't overflow NLM protocol maximums.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-011-rpc_buffer_size_estimates_are_too_large.dif:

From: Chuck Lever <chuck.lever@oracle.com>

Date: Thu, 29 Mar 2007 16:47:53 -0400

SUNRPC: RPC buffer size estimates are too large

The RPC buffer size estimation logic in net/sunrpc/clnt.c always significantly overestimates the requirements for the buffer size. A little instrumentation demonstrated that in fact rpc_malloc was never allocating the buffer from the mempool, but almost always called kmalloc.

To compute the size of the RPC buffer more precisely, split p_bufsiz into two fields; one for the argument size, and one for the result size.

Then, compute the sum of the exact call and reply header sizes, and split the RPC buffer precisely between the two. That should keep almost all RPC buffers within the 2KiB buffer mempool limit.

And, we can finally be rid of RPC_SLACK_SPACE!

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-012-eliminate_side_effects_from_rpc_malloc.dif:

From: Chuck Lever <chuck.lever@oracle.com>

Date: Thu, 29 Mar 2007 16:47:58 -0400

SUNRPC: Eliminate side effects from rpc_malloc

Currently rpc_malloc sets req->rq_buffer internally. Make this a more generic interface: return a pointer to the new buffer (or NULL) and make the caller set req->rq_buffer and req->rq_bufsize. This looks much more like kmalloc and eliminates the side effects.

To fix a potential deadlock, this patch also replaces GFP_NOFS with GFP_NOWAIT in rpc_malloc. This prevents async RPCs from sleeping outside the RPC's task scheduler while allocating their buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-013-introduce_rpcbind_replacement_for_in_kernel_portmapper.dif:

From: Chuck Lever <chuck.lever@oracle.com>

Date: Thu, 29 Mar 2007 16:48:04 -0400

SUNRPC: introduce rpcbind: replacement for in-kernel portmapper

Introduce a replacement for the in-kernel portmapper client that supports all 3 versions of the rpcbind protocol. This code is not used yet.

Original code by Groupe Bull updated for the latest kernel, with multiple bug fixes.

Note that rpcb_clnt.c does not yet support registering via versions 3 and 4 of the rpcbind protocol. That is planned for a later patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-014-switch_socket_based_rpc_transports_to_use_rpcbind.dif:

From: Chuck Lever <chuck.lever@oracle.com>

Date: Thu, 29 Mar 2007 16:48:10 -0400

SUNRPC: switch socket-based RPC transports to use rpcbind

Now that we have a version of the portmapper that supports versions 3 and 4 of the rpcbind protocol, use it for new RPC client connections over sockets.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-015-switch_the_rpc_server_to_use_the_new_rpcbind_registration_api.dif:

From: Chuck Lever <chuck.lever@oracle.com>

Date: Thu, 29 Mar 2007 16:48:16 -0400

SUNRPC: switch the RPC server to use the new rpcbind registration API

Eventually this interface will support versions 3 and 4 of the rpcbind protocol, which will allow the Linux RPC server to register services on IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-016-switch_nfsroot_to_use_new_rpcbind_client.dif:

From: Chuck Lever <chuck.lever@oracle.com>

Date: Thu, 29 Mar 2007 16:48:22 -0400

NFS: switch NFSROOT to use new rpcbind client

It is arguable whether NFSROOT will support IPv6, and thus whether rpcb_getport_external needs to support rpcbind versions greater than 2.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-017-remove_old_portmapper.dif:

From: Chuck Lever <chuck.lever@oracle.com>

Date: Thu, 29 Mar 2007 16:48:27 -0400

SUNRPC: remove old portmapper

net/sunrpc/pmap_clnt.c has been replaced by net/sunrpc/rpcb_clnt.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-018-rpc_client_should_retry_with_different_versions_of_rpcbind.dif:

From: Chuck Lever <chuck.lever@oracle.com>

Date: Thu, 29 Mar 2007 16:48:33 -0400

SUNRPC: RPC client should retry with different versions of rpcbind

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-019-nfs_nordirplus.dif:

From: Steve Dickson <steved@redhat.com>

Date: Sat, 14 Apr 2007 17:01:15 -0400

NFS: Added support to turn off the NFSv3 READDIRPLUS RPC.

READDIRPLUS can be a performance hindrance when the client is working with large directories. In addition, some servers still have bugs in their implementations (e.g. Tru64 returns wrong values for the fsid).

Add a mount flag to enable users to turn it off at mount time following the implementation in Apple's NFS client.

Signed-off-by: Steve Dickson <steved@redhat.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-020-readdirplus_add_valid_timestamps.dif:

From: Neil Brown <neilb@suse.de>

Date: Mon, 16 Apr 2007 09:35:27 +1000

NFS: Set meaningful value for fattr->time_start in readdirplus results.

Don't use uninitialsed value for fattr->time_start in readdirplus results.

The 'fattr' structure filled in by nfs3_decode_direct does not get a value for ->time_start set. Thus if an entry is for an inode that we already have in cache, when nfs_readdir_lookup calls nfs_fhget, it will call nfs_refresh_inode and may update the inode with out-of-date information.

Directories are read a page at a time, so each page could have a different timestamp that "should" be used to set the time_start for the fattr for info in that page. However storing the timestamp per page is awkward. (We could stick in the first 4 bytes and only read 4092 bytes, but that is a bigger code change than I am interested it).

This patch ignores the readdir_plus attributes if a readdir finds the information already in cache, and otherwise sets ->time_start to the time the readdir request was sent to the server.

It might be nice to store - in the directory inode - the time stamp for the earliest readdir request that is still in the page cache, so that we don't ignore attribute data that we don't have to. This patch doesn't do that.

Signed-off-by: Neil Brown <neilb@suse.de>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-021-fix_readdir_stale_cache.dif:

From: Neil Brown <neilb@suse.de>

Date: Mon, 26 Feb 2007 12:48:25 +1100

NFS: Fix directory caching problem - with test case and patch.

Try running this script in an NFS mounted directory (Client relatively recent - 2.6.18 has the problem as does 2.6.20).

------------------------------------------------------ #!/bin/bash # # This script will produce the following errormessage from tar: # # tar: newdir/innerdir/innerfile: file changed as we read it

# create dirs rm -rf nfstest mkdir -p nfstest/dir/innerdir

# create files (should not be empty) echo "Hello World!" >nfstest/dir/file echo "Hello World!" >nfstest/dir/innerdir/innerfile

# problem only happens if we sleep before chmod sleep 1

# change file modes chmod -R a+r nfstest

# rename dir mv nfstest/dir nfstest/newdir

# tar it tar -cf nfstest/nfstest.tar -C nfstest newdir

# restore old dir name mv nfstest/newdir nfstest/dir --------------------------------------------------------

What happens:

The 'chmod -R' does a readdir_plus in each directory and the results get cached in the page cache. It then updates the ctime on each file by one second. When this happens, the post-op attributes are used to update the ctime stored on the client to match the value in the kernel.

The 'mv' calls shrink_dcache_parent on the directory tree which flushes all the dentries (so a new lookup will be required) but doesn't flush the inodes or pagecache.

The 'tar' does a readdir on each directory, but (in the case of 'innerdir' at least) satisfies it from the pagecache and uses the READDIRPLUS data to update all the inodes. In the case of 'innerdir/innerfile', the ctime is out of date.

'tar' then calls 'lstat' on innerdir/innerfile getting an old ctime. It then opens the file (triggering a GETATTR), reads the content, and then calls fstat to see if anything has changed. It finds that ctime has changed and so complains.

The problem seems to be that the cache readdirplus info is kept around for too long.

My patch below discards pagecache data for directories when dentry_iput is called on them. This effectively removes the symptom which convinces me that I correctly understand the problem. However I'm not convinced that is a proper solution, as there could easily be other races that trigger the same problem without being affected by this 'fix'.

One possibility would be to require that readdirplus pagecache data be only used *once* to instantiate an inode. Somehow it should then be invalidated so that if the dentry subsequently disappears, it will cause a new request to the server to fill in the stat data.

Another possibility is to compare the cache_change_attribute on the inode with something similar for the readdirplus info and reject the info from readdirplus if it is too old.

I haven't tried to implement these and would value other opinions before I do.

Thanks, NeilBrown

Signed-off-by: Neil Brown <neilb@suse.de>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-022-invalidate_cached_acl_on_setacl.dif:

From: J. Bruce Fields - unquoted <bfields@citi.umich.edu>

Date: Sat, 10 Feb 2007 01:33:24 -0500

nfs4: invalidate cached acl on setacl

The ACL that the server sets may not be exactly the one we set--for example, it may silently turn off bits that it does not support. So we should remove any cached ACL so that any subsequent request for the ACL will go to the server.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-023-fix_spkm3_s_use_of_hmac.dif:

From: J. Bruce Fields - unquoted <bfields@snoopy.citi.umich.edu>

Date: Sat, 10 Feb 2007 01:33:25 -0500

spkm3: fix spkm3's use of hmac

I think I botched an attempt to keep an spkm3 patch up-to-date with a recent crypto api change.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-024-remove_bad_kfree_unnecessary_export.dif:

From: J. Bruce Fields - unquoted <bfields@snoopy.citi.umich.edu>

Date: Sat, 10 Feb 2007 01:33:26 -0500

spkm3: remove bad kfree, unnecessary export

We're kfree()'ing something that was allocated on the stack!

Also remove an unnecessary symbol export while we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-025-initialize_hash.dif:

From: J. Bruce Fields - unquoted <bfields@snoopy.citi.umich.edu>

Date: Sat, 10 Feb 2007 01:33:27 -0500

spkm3: initialize hash

There's an initialization step here I missed.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-026-avoid_flush_when_locking_when_interclient_consistency_not_needed.dif:

From: NeilBrown <neilb@suse.de>

Date: Tue, 6 Mar 2007 16:40:29 +1100

Avoid flush-when-locking when interclient consistency not needed.

If nolock and nocto mount options are set, then the implication is that cache consistency with other clients is not desired. This is likely to be the case when the filesystem is only access by the one client - as a nfs-mounted root might be.

In that case, there is no benefit in flushing writes and invalidating caches around lock/unlock requests.

This patch makes those flushes and invalidates conditional on either nolock or nocto being clear.

Signed-off-by: Neil Brown <neilb@suse.de>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

linux-2.6.21-027-debugging_do_not_merge.dif:

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Date: Sun, 15 Apr 2007 19:02:47 -0400

NFS: Debugging code. Do not merge...

Adds consistency checks for nfs_page list operations

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory   -  
[TXT]linux-2.6.21-001-fix..>2007-04-15 23:03 1.5K 
[TXT]linux-2.6.21-002-no_..>2007-04-19 14:15 1.7K 
[TXT]linux-2.6.21-003-cle..>2007-04-19 14:15 8.3K 
[TXT]linux-2.6.21-004-cle..>2007-04-19 14:15 13K 
[TXT]linux-2.6.21-005-cle..>2007-04-19 14:15 4.2K 
[TXT]linux-2.6.21-006-fix..>2007-04-19 14:15 15K 
[TXT]linux-2.6.21-007-fix..>2007-04-19 14:15 9.3K 
[TXT]linux-2.6.21-008-cle..>2007-04-19 14:15 1.0K 
[TXT]linux-2.6.21-009-use..>2007-04-19 14:15 5.0K 
[TXT]linux-2.6.21-010-shr..>2007-04-15 23:03 3.7K 
[TXT]linux-2.6.21-011-rpc..>2007-04-19 14:15 15K 
[TXT]linux-2.6.21-012-eli..>2007-04-19 14:15 5.6K 
[TXT]linux-2.6.21-013-int..>2007-04-19 14:15 21K 
[TXT]linux-2.6.21-014-swi..>2007-04-15 23:03 1.3K 
[TXT]linux-2.6.21-015-swi..>2007-04-15 23:03 966  
[TXT]linux-2.6.21-016-swi..>2007-04-15 23:03 959  
[TXT]linux-2.6.21-017-rem..>2007-04-15 23:03 13K 
[TXT]linux-2.6.21-018-rpc..>2007-04-19 14:15 1.7K 
[TXT]linux-2.6.21-019-nfs..>2007-04-15 23:03 2.2K 
[TXT]linux-2.6.21-020-rea..>2007-04-19 14:15 3.7K 
[TXT]linux-2.6.21-021-fix..>2007-04-19 14:15 3.5K 
[TXT]linux-2.6.21-022-inv..>2007-04-15 23:03 1.1K 
[TXT]linux-2.6.21-023-fix..>2007-04-15 23:03 1.2K 
[TXT]linux-2.6.21-024-rem..>2007-04-15 23:03 1.0K 
[TXT]linux-2.6.21-025-ini..>2007-04-15 23:03 861  
[TXT]linux-2.6.21-026-avo..>2007-04-15 23:03 2.9K 
[TXT]linux-2.6.21-027-deb..>2007-04-19 14:15 3.0K 
[   ]series 2007-04-21 22:59 1.5K 

Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/5.4.16 mod_perl/2.0.11 Perl/v5.16.3 Server at linux-nfs.org Port 80