NFS oops on a 2.6.24.4 kernel server
Quentin Godfroy
godfroy at clipper.ens.fr
Wed Apr 23 11:24:01 EDT 2008
On Wed, Apr 23, 2008 at 01:01:19PM +0200, Quentin Godfroy wrote:
> Hi,
>
> I have an oops on a nfsv4 server which seems to be triggered by the callback
> on the client not responding (because behind a firewall) (although it does
> not happen the first time)
>
> Here is the dmesg output
>
> [21854.928981] nfs4_cb: server 195.221.106.182 not responding, timed out
> [21946.939410] nfs4_cb: server 195.221.106.182 not responding, timed out
> [22147.874670] nfs4_cb: server 195.221.106.182 not responding, timed out
> [23925.545738] nfs4_cb: server 195.221.106.182 not responding, timed out
> [23934.545419] nfs4_cb: server 195.221.106.182 not responding, timed out
> [23934.545437] Unable to handle kernel NULL pointer dereference at
> 0000000000000018 RIP:
> [23934.545440] [<ffffffff8817a2c5>] :sunrpc:rpc_shutdown_client+0x25/0xf0
> [23934.545461] PGD 1c045067 PUD 1e065067 PMD 0
> [23934.545464] Oops: 0000 [1] PREEMPT
> [23934.545466] CPU 0
> [23934.545468] Modules linked in: md5 crypto_algapi rpcsec_gss_krb5 nfs nfsd
> lockd nfs_acl auth_rpcgss sunrpc exportfs fuse ipv6 tcp_diag inet_diag
> snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss sdhci snd_pcm
> ehci_hcd ohci_hcd snd_timer k8temp hwmon ipw2200 ohci1394 mmc_core snd
> ieee80211 ieee80211_crypt ieee1394 usbcore soundcore snd_page_alloc
> [23934.545489] Pid: 26927, comm: nfs4_cb_probe Not tainted 2.6.24.5-neptune
> #1
> [23934.545492] RIP: 0010:[<ffffffff8817a2c5>] [<ffffffff8817a2c5>]
> :sunrpc:rpc_shutdown_client+0x25/0xf0
> [23934.545506] RSP: 0018:ffff810003b2bea0 EFLAGS: 00010246
> [23934.545508] RAX: 00000000fffffffb RBX: ffff81001c2f2c00 RCX:
> 0000000000000282
> [23934.545511] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> 0000000000000000
> [23934.545513] RBP: 0000000000000018 R08: ffffffff881a0d68 R09:
> ffff810003814000
> [23934.545516] R10: ffffffff805bf980 R11: ffffffff8026af40 R12:
> 0000000000000000
> [23934.545518] R13: 0000000000000000 R14: ffff810003b2beb8 R15:
> 0000000000000000
> [23934.545521] FS: 00002b308ba53af0(0000) GS:ffffffff8056c000(0000)
> knlGS:0000000000000000
> [23934.545524] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [23934.545527] CR2: 0000000000000018 CR3: 000000001c049000 CR4:
> 00000000000006e0
> [23934.545529] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [23934.545532] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [23934.545535] Process nfs4_cb_probe (pid: 26927, threadinfo
> ffff810003b2a000, task ffff810003b56810)
> [23934.545537] Stack: ffff81001ef5e440 ffff810016c16a00 0000000000000000
> ffff81001ef5e440
> [23934.545541] 0000000000000282 ffffffff881807e9 ffff81001c2f2c00
> ffff81001c2f2c00
> [23934.545545] ffffffff881e7c60 0000000000000000 0000000000000000
> ffffffff881e7ccc
> [23934.545548] Call Trace:
> [23934.545563] [<ffffffff881807e9>] :sunrpc:rpc_put_task+0x99/0xc0
> [23934.545582] [<ffffffff881e7c60>] :nfsd:do_probe_callback+0x0/0x80
> [23934.545596] [<ffffffff881e7ccc>] :nfsd:do_probe_callback+0x6c/0x80
> [23934.545603] [<ffffffff802484db>] kthread+0x4b/0x80
> [23934.545607] [<ffffffff8020cc18>] child_rip+0xa/0x12
> [23934.545613] [<ffffffff80248490>] kthread+0x0/0x80
> [23934.545616] [<ffffffff8020cc0e>] child_rip+0x0/0x12
> [23934.545619]
> [23934.545620]
> [23934.545620] Code: 49 39 6c 24 18 0f 84 84 00 00 00 4c 89 e7 e8 88 71 00
> 00 49
> [23934.545628] RIP [<ffffffff8817a2c5>]
> :sunrpc:rpc_shutdown_client+0x25/0xf0
> [23934.545641] RSP <ffff810003b2bea0>
> [23934.545643] CR2: 0000000000000018
> [23934.545650] ---[ end trace aa442db332f323ac ]---
> [24781.428396] nfs4_cb: server 195.221.106.182 not responding, timed out
>
> The server and the client are both x86_64 2.6.24.4, the mount options are ro,sec=none,soft,intr,port=2049,rsize=8192,wsize=8192
>
> The server still seems fine, and reading files continues to work.
>
> I am unable to say if one of the numerous commits in 2.6.25 would correct
> it.
Is there some kind of race condition in
fs/nfsd/nfs4callback.c:do_probe_callback? I see sometimes two nfs4_cb_probe
for only one client. Would they possibly share the same struct nfs4_callback
cb?
More information about the NFSv4
mailing list