NFSv4 on Gutsy
J. Bruce Fields
bfields at fieldses.org
Mon Oct 22 14:04:14 EDT 2007
On Mon, Oct 22, 2007 at 10:21:00AM -0400, Kevin Coffman wrote:
> I'm curious why that bug is categorized as a nfs-utils problem when it
> is a kernel oops. I'm going to post the oops info here in hopes that
> it is familiar to someone more familiar with the kernel code.
>
> Oct 12 10:46:34 jbardin kernel: [69236.290562] BUG: unable to handle
> kernel NULL pointer dereference at virtual address 00000014
> Oct 12 10:46:34 jbardin kernel: [69236.290570] printing eip:
> Oct 12 10:46:34 jbardin kernel: [69236.290573] f8b50901
> Oct 12 10:46:34 jbardin kernel: [69236.290575] *pde = 00000000
> Oct 12 10:46:34 jbardin kernel: [69236.290578] Oops: 0000 [#4]
> Oct 12 10:46:34 jbardin kernel: [69236.290580] SMP
> Oct 12 10:46:34 jbardin kernel: [69236.290584] Modules linked in:
> vmnet(P) vmmon(P) isofs udf rfcomm l2cap bluetooth ppdev acpi_cpufreq
> cpufreq_powersave cpufreq_use
> rspace cpufreq_ondemand cpufreq_conservative cpufreq_stats freq_table
> video bay dock button container ac sbs battery rpcsec_gss_krb5
> auth_rpcgss nfs lockd sunrpc xfs
> lp snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss
> snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi
> snd_seq_midi_event snd_seq snd_timer sn
> d_seq_device usb_storage ide_core nvidia(P) libusual snd i2c_core
Is it possible to reproduce this without the proprietary modules loaded?
> serio_raw ipv6 soundcore iTCO_wdt iTCO_vendor_support parport_pc
> parport pcspkr psmouse snd_page_al
> loc intel_agp shpchp pci_hotplug agpgart evdev ext3 jbd mbcache usbhid
> hid sg sr_mod cdrom sd_mod ata_piix floppy tg3 ata_generic ehci_hcd
> libata scsi_mod uhci_hcd u
> sbcore thermal processor fan fuse apparmor commoncap
> Oct 12 10:46:34 jbardin kernel: [69236.290670] CPU: 0
> Oct 12 10:46:34 jbardin kernel: [69236.290671] EIP: 0060:[<f8b50901>]
> Tainted: P VLI
> Oct 12 10:46:34 jbardin kernel: [69236.290673] EFLAGS: 00210246
> (2.6.22-14-generic #1)
> Oct 12 10:46:34 jbardin kernel: [69236.290683] EIP is at
> g_token_size+0x1/0x20 [auth_rpcgss]
OK, so net/sunrpc/auth_gss/gss_generic_token.c:g_token_size() is a very
simple function that only deferences a single pointer, the xdr_netobj *
passed as its argument. So that pointer must have had the (obviously
bogus) value 0x14.
> Oct 12 10:46:34 jbardin kernel: [69236.290688] eax: 00000014 ebx:
> 00000014 ecx: c8c559e0 edx: 00000016
> Oct 12 10:46:34 jbardin kernel: [69236.290691] esi: c8c559e0 edi:
> 00000000 ebp: c8c559b4 esp: c8c55950
> Oct 12 10:46:34 jbardin kernel: [69236.290694] ds: 007b es: 007b fs:
> 00d8 gs: 0033 ss: 0068
> Oct 12 10:46:34 jbardin kernel: [69236.290698] Process mount.nfs4
> (pid: 11683, ti=c8c54000 task=f7af6f90 task.ti=c8c54000)
> Oct 12 10:46:34 jbardin kernel: [69236.290701] Stack: 00000014
> f8a8b463 f3e8b96c 00000000 459086b5 470f88c9 00000006 4590ee11
> Oct 12 10:46:34 jbardin kernel: [69236.290710] 00003ef8 00001f8e
> 00000000 c8c55968 00000000 f8a8cd04 f7207840 f2eab048
> Oct 12 10:46:34 jbardin kernel: [69236.290719] f2eab044 f8b50bf9
> c8c559b4 f8b4fe1a 00200246 00200287 c4435000 f7207340
> Oct 12 10:46:34 jbardin kernel: [69236.290728] Call Trace:
> Oct 12 10:46:34 jbardin kernel: [69236.290734] [<f8a8b463>]
> gss_get_mic_kerberos+0x43/0x160 [rpcsec_gss_krb5]
And net/sunrpc/auth_gss/gss_krb5_seal.c:gss_get_mic_kerberos() calls
g_token_size() only once, right at the start, with
&gss_ctx->internal_ctx_id->mech_used, where gss_ctx is its first
argument. At least on my (x86) machine the mech_used field is at offset
0x14 from the start of krb5_ctx. So gss_ctx->internal_ctx_id was NULL.
> Oct 12 10:46:34 jbardin kernel: [69236.290787] [<f8b50bf9>]
> gss_get_mic+0x9/0x10 [auth_rpcgss]
Just a simple wrapper for the above.
> Oct 12 10:46:34 jbardin kernel: [69236.290798] [<f8b4fe1a>]
> gss_marshal+0xea/0x170 [auth_rpcgss]
This is the client function that creates the cred on an rpc call to
send. It is passed an rpc_task "task", gets a gss_cred from
task->tk_msg.rpc_cred with a container_of(), gets a context "ctx" from
it with gss_cred_get_ctx(), and then passes ctx->gc_gss_ctx to
gss_get_mic().
So the task points to a cred that wasn't completely initialized. The
only place that assigns to internal_ctx_id is
gss_import_sec_context_kerberos(), so it looks like this either failed
or was never called. Ditto for that function's caller,
gss_file_context(). But that function's called by gss_pipe_downcall(),
which returns (in the message) a context only if gss_fill_context()
suceeds.
Looks like the gss_msg->ctx field is initialized to NULL on creation (by
the kzalloc in gss_alloc_msg()) and is set to non-NULL only on succesful
return from gss_fill_context()).
So I'm having trouble seeing how this happened.
> Oct 12 10:46:34 jbardin kernel: [69236.290870] [<f8c3c2c9>]
> rpcauth_marshcred+0x29/0x70 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.290893]
> [generic_file_buffered_write+568/1680]
> generic_file_buffered_write+0x238/0x690
> Oct 12 10:46:34 jbardin kernel: [69236.290917] [<f8c35480>]
> call_transmit+0x120/0x250 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.290933] [<f8c37b97>]
> xprt_reserve+0xf7/0x140 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.290954] [<f8c351d6>]
> call_reserve+0x16/0x70 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.290968] [<f8c80730>]
> nfs4_xdr_enc_lookup_root+0x0/0x90 [nfs]
> Oct 12 10:46:34 jbardin kernel: [69236.291002] [<f8c3b492>]
> __rpc_execute+0x62/0x270 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.291027] [<f8c3a91e>]
> rpc_set_active+0x1e/0x60 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.291052] [<f8c35dd8>]
> rpc_call_sync+0x98/0xa0 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.291088] [<f8c76c12>]
> nfs4_proc_get_root+0x72/0xc0 [nfs]
> Oct 12 10:46:34 jbardin kernel: [69236.291146] [<f8c672f9>]
> nfs4_path_walk+0x69/0x2d0 [nfs]
> Oct 12 10:46:34 jbardin kernel: [69236.291190] [d_alloc+265/400]
> d_alloc+0x109/0x190
> Oct 12 10:46:34 jbardin kernel: [69236.291205] [<f8c464ef>]
> rpc_alloc_inode+0xf/0x20 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.291223]
> [inotify_d_instantiate+24/128] inotify_d_instantiate+0x18/0x80
> Oct 12 10:46:34 jbardin kernel: [69236.291249] [d_rehash+28/48]
> d_rehash+0x1c/0x30
> Oct 12 10:46:34 jbardin kernel: [69236.291257] [<f8c466a3>]
> rpc_populate+0x123/0x130 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.291280] [<f8c464ef>]
> rpc_alloc_inode+0xf/0x20 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.291314]
> [mntput_no_expire+36/160] mntput_no_expire+0x24/0xa0
> Oct 12 10:46:34 jbardin kernel: [69236.291328] [<f8c476c2>]
> rpc_mkdir+0x152/0x250 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.291348] [__alloc_pages+79/832]
> __alloc_pages+0x4f/0x340
> Oct 12 10:46:34 jbardin kernel: [69236.291440] [<f8c361eb>]
> rpc_setup_pipedir+0x7b/0xd0 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.291476] [<f8c37038>]
> xprt_get+0x8/0x10 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.291494] [<f8c36300>]
> rpc_clone_client+0xc0/0xf0 [sunrpc]
> Oct 12 10:46:34 jbardin kernel: [69236.291529] [<f8c6130d>]
> nfs_init_server_rpcclient+0x1d/0xd0 [nfs]
> Oct 12 10:46:34 jbardin kernel: [69236.291558] [<f8c62b03>]
> nfs4_create_server+0x123/0x3c0 [nfs]
> Oct 12 10:46:34 jbardin kernel: [69236.291606]
> [default_wake_function+0/16] default_wake_function+0x0/0x10
> Oct 12 10:46:34 jbardin kernel: [69236.291620]
> [dev_queue_xmit+155/816] dev_queue_xmit+0x9b/0x330
> Oct 12 10:46:34 jbardin kernel: [69236.291642] [ip_output+407/816]
> ip_output+0x197/0x330
> Oct 12 10:46:34 jbardin kernel: [69236.291646]
> [neigh_resolve_output+190/592] neigh_resolve_output+0xbe/0x250
> Oct 12 10:46:34 jbardin kernel: [69236.291689] [copy_from_user+46/112]
> copy_from_user+0x2e/0x70
I don't understand why all this stuff is in the backtrace.
> Oct 12 10:46:34 jbardin kernel: [69236.291705] [<f8c6a319>]
> nfs_copy_user_string+0x39/0xa0 [nfs]
> Oct 12 10:46:34 jbardin kernel: [69236.291733] [<f8c6ad6a>]
> nfs4_get_sb+0x23a/0x380 [nfs]
> Oct 12 10:46:34 jbardin kernel: [69236.291775] [<f89c2670>]
> ext3_permission+0x0/0x10 [ext3]
> Oct 12 10:46:34 jbardin kernel: [69236.291790] [permission+270/288]
> permission+0x10e/0x120
> Oct 12 10:46:34 jbardin kernel: [69236.291813]
> [get_page_from_freelist+705/928] get_page_from_freelist+0x2c1/0x3a0
> Oct 12 10:46:34 jbardin kernel: [69236.291874] [__alloc_pages+79/832]
> __alloc_pages+0x4f/0x340
> Oct 12 10:46:34 jbardin kernel: [69236.291927]
> [vfs_kern_mount+182/304] vfs_kern_mount+0xb6/0x130
> Oct 12 10:46:34 jbardin kernel: [69236.291960] [do_kern_mount+63/240]
> do_kern_mount+0x3f/0xf0
> Oct 12 10:46:34 jbardin kernel: [69236.291982] [do_mount+1643/1856]
> do_mount+0x66b/0x740
> Oct 12 10:46:34 jbardin kernel: [69236.292022] [__wake_up_sync+63/112]
> __wake_up_sync+0x3f/0x70
> Oct 12 10:46:34 jbardin kernel: [69236.292041] [kunmap_atomic+94/160]
> kunmap_atomic+0x5e/0xa0
> Oct 12 10:46:34 jbardin kernel: [69236.292045] [kunmap_atomic+106/160]
> kunmap_atomic+0x6a/0xa0
> Oct 12 10:46:34 jbardin kernel: [69236.292061]
> [__handle_mm_fault+1671/2816] __handle_mm_fault+0x687/0xb00
> Oct 12 10:46:34 jbardin kernel: [69236.292073] [schedule+714/2192]
> schedule+0x2ca/0x890
> Oct 12 10:46:34 jbardin kernel: [69236.292142]
> [do_page_fault+905/1680] do_page_fault+0x389/0x690
> Oct 12 10:46:34 jbardin kernel: [69236.292204] [error_code+114/128]
> error_code+0x72/0x80
> Oct 12 10:46:34 jbardin kernel: [69236.292235]
> [clip_neigh_error+40/48] clip_neigh_error+0x28/0x30
> Oct 12 10:46:34 jbardin kernel: [69236.292242]
> [copy_mount_options+171/336] copy_mount_options+0xab/0x150
> Oct 12 10:46:34 jbardin kernel: [69236.292276] [sys_mount+119/192]
> sys_mount+0x77/0xc0
But, OK, it looks like it happened while the client was looking up the
root filehandle as part of the mount. This is probably the very first
non-NULL rpc call it does to the server.
> Oct 12 10:46:34 jbardin kernel: [69236.292300] [syscall_call+7/11]
> syscall_call+0x7/0xb
> Oct 12 10:46:34 jbardin kernel: [69236.292331] [clip_ioctl+1280/1296]
> clip_ioctl+0x500/0x510
> Oct 12 10:46:34 jbardin kernel: [69236.292359] =======================
> Oct 12 10:46:34 jbardin kernel: [69236.292361] Code: 00 00 7e 1f 3d ff
> 00 00 00 b2 02 7e 16 3d ff ff 00 00 b2 03 7e 0d 31 d2 3d ff ff ff 00
> 0f 9f c2 83 c2 04 89 d0 c3 8d 74 26 00 53 <8b> 00 89 d3 83 c0 04 01 c3
> 89 d8 e8 bf ff ff ff 8d 44 03 01 5b
> Oct 12 10:46:34 jbardin kernel: [69236.292407] EIP: [<f8b50901>]
> g_token_size+0x1/0x20 [auth_rpcgss] SS:ESP 0068:c8c55950
Anyway, I'm stumped.
--b.
More information about the NFSv4
mailing list