Some NFS file transfers fail and hang automounting

From Linux NFS

Revision as of 15:39, 22 October 2010 by Amschuma (Talk | contribs)
Jump to: navigation, search

Contents

About

  • Kernel version: 2.6.33.5-112.fc13.x86_64
  • Bug 16213
  • Reported by: Philippe Dax (June 15, 2010)
  • Fixed by: Trond Myklebust (June 16, 2010)

Symptoms

  • Given a file "foo" of 50Mb on a remote machine "remote".
    • This command will never finish
      <localmachine $> cp /remote_mount_point/foo bar 
    • bar will have a size less than foo.
    • automounting of the local machine is hung.
  • The following message will show up in /var/log/messages
kernel: Callback slot table overflowed
  • The problem doesn't occur if foo has a size less than 10Mb
  • The final size of bar appears to be random
  • This incident occurs with:
    sunrpc.tcp_slot_table_entries = 16
  • This incident does NOT occur with:
    sunrpc.tcp_slot_table_entries = 32

Cause

Resolution

This problem was fixed by commit b76ce56192bcf618013fb9aecd83488cffd645cc

commit b76ce56192bcf618013fb9aecd83488cffd645cc
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Wed Jun 16 13:57:32 2010 -0400

    SUNRPC: Fix a re-entrancy bug in xs_tcp_read_calldir()
    
    If the attempt to read the calldir fails, then instead of storing the read
    bytes, we currently discard them. This leads to a garbage final result when
    upon re-entry to the same routine, we read the remaining bytes.
    
    Fixes the regression in bugzilla number 16213. Please see
        https://bugzilla.kernel.org/show_bug.cgi?id=16213
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
    Cc: stable@kernel.org
Personal tools