NFS oops on a 2.6.24.4 kernel server

Quentin Godfroy godfroy at clipper.ens.fr
Wed Apr 23 11:24:01 EDT 2008


On Wed, Apr 23, 2008 at 01:01:19PM +0200, Quentin Godfroy wrote:
> Hi,
> 
> I have an oops on a nfsv4 server which seems to be triggered by the callback
> on the client not responding (because behind a firewall) (although it does
> not happen the first time)
> 
> Here is the dmesg output
> 
> [21854.928981] nfs4_cb: server 195.221.106.182 not responding, timed out
> [21946.939410] nfs4_cb: server 195.221.106.182 not responding, timed out
> [22147.874670] nfs4_cb: server 195.221.106.182 not responding, timed out
> [23925.545738] nfs4_cb: server 195.221.106.182 not responding, timed out
> [23934.545419] nfs4_cb: server 195.221.106.182 not responding, timed out
> [23934.545437] Unable to handle kernel NULL pointer dereference at
> 0000000000000018 RIP: 
> [23934.545440]  [<ffffffff8817a2c5>] :sunrpc:rpc_shutdown_client+0x25/0xf0
> [23934.545461] PGD 1c045067 PUD 1e065067 PMD 0 
> [23934.545464] Oops: 0000 [1] PREEMPT 
> [23934.545466] CPU 0 
> [23934.545468] Modules linked in: md5 crypto_algapi rpcsec_gss_krb5 nfs nfsd
> lockd nfs_acl auth_rpcgss sunrpc exportfs fuse ipv6 tcp_diag inet_diag
> snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss sdhci snd_pcm
> ehci_hcd ohci_hcd snd_timer k8temp hwmon ipw2200 ohci1394 mmc_core snd
> ieee80211 ieee80211_crypt ieee1394 usbcore soundcore snd_page_alloc
> [23934.545489] Pid: 26927, comm: nfs4_cb_probe Not tainted 2.6.24.5-neptune
> #1
> [23934.545492] RIP: 0010:[<ffffffff8817a2c5>]  [<ffffffff8817a2c5>]
> :sunrpc:rpc_shutdown_client+0x25/0xf0
> [23934.545506] RSP: 0018:ffff810003b2bea0  EFLAGS: 00010246
> [23934.545508] RAX: 00000000fffffffb RBX: ffff81001c2f2c00 RCX:
> 0000000000000282
> [23934.545511] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> 0000000000000000
> [23934.545513] RBP: 0000000000000018 R08: ffffffff881a0d68 R09:
> ffff810003814000
> [23934.545516] R10: ffffffff805bf980 R11: ffffffff8026af40 R12:
> 0000000000000000
> [23934.545518] R13: 0000000000000000 R14: ffff810003b2beb8 R15:
> 0000000000000000
> [23934.545521] FS:  00002b308ba53af0(0000) GS:ffffffff8056c000(0000)
> knlGS:0000000000000000
> [23934.545524] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [23934.545527] CR2: 0000000000000018 CR3: 000000001c049000 CR4:
> 00000000000006e0
> [23934.545529] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [23934.545532] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [23934.545535] Process nfs4_cb_probe (pid: 26927, threadinfo
> ffff810003b2a000, task ffff810003b56810)
> [23934.545537] Stack:  ffff81001ef5e440 ffff810016c16a00 0000000000000000
> ffff81001ef5e440
> [23934.545541]  0000000000000282 ffffffff881807e9 ffff81001c2f2c00
> ffff81001c2f2c00
> [23934.545545]  ffffffff881e7c60 0000000000000000 0000000000000000
> ffffffff881e7ccc
> [23934.545548] Call Trace:
> [23934.545563]  [<ffffffff881807e9>] :sunrpc:rpc_put_task+0x99/0xc0
> [23934.545582]  [<ffffffff881e7c60>] :nfsd:do_probe_callback+0x0/0x80
> [23934.545596]  [<ffffffff881e7ccc>] :nfsd:do_probe_callback+0x6c/0x80
> [23934.545603]  [<ffffffff802484db>] kthread+0x4b/0x80
> [23934.545607]  [<ffffffff8020cc18>] child_rip+0xa/0x12
> [23934.545613]  [<ffffffff80248490>] kthread+0x0/0x80
> [23934.545616]  [<ffffffff8020cc0e>] child_rip+0x0/0x12
> [23934.545619] 
> [23934.545620] 
> [23934.545620] Code: 49 39 6c 24 18 0f 84 84 00 00 00 4c 89 e7 e8 88 71 00
> 00 49 
> [23934.545628] RIP  [<ffffffff8817a2c5>]
> :sunrpc:rpc_shutdown_client+0x25/0xf0
> [23934.545641]  RSP <ffff810003b2bea0>
> [23934.545643] CR2: 0000000000000018
> [23934.545650] ---[ end trace aa442db332f323ac ]---
> [24781.428396] nfs4_cb: server 195.221.106.182 not responding, timed out
> 
> The server and the client are both x86_64 2.6.24.4, the mount options are ro,sec=none,soft,intr,port=2049,rsize=8192,wsize=8192
> 
> The server still seems fine, and reading files continues to work. 
> 
> I am unable to say if one of the numerous commits in 2.6.25 would correct
> it.

Is there some kind of race condition in
fs/nfsd/nfs4callback.c:do_probe_callback? I see sometimes two nfs4_cb_probe
for only one client. Would they possibly share the same struct nfs4_callback
cb?


More information about the NFSv4 mailing list