[pnfs] pnfs/gfs2 stubs

Frank Filz ffilzlnx at us.ibm.com
Mon May 5 16:38:20 EDT 2008


On Mon, 2008-05-05 at 13:29 -0700, Frank Filz wrote:
> On Mon, 2008-05-05 at 16:23 -0400, david m. richter wrote:
> > On Mon, 5 May 2008, Frank Filz wrote:
> > 
> > > On Fri, 2008-05-02 at 16:10 -0400, William A. (Andy) Adamson wrote:
> > > > i have a patch that fixes print_ds_list. i'll submit it asap
> > > 
> > > Ok, I think I figured out how to fix print_ds_list, unfortunately, my
> > > system is still crashing and not being very helpful. I can't get
> > > anything to show up in /var/log/messages and I don't have serial console
> > > set up, so all I get is the last 25 lines of log information or so. The
> > > current crash is in mount.nfs process, but the stack makes no sense
> > > whatsoever:
> > > 
> > > dump_trace
> > > print_trace_address
> > > print_trace_log_lvl
> > > show_trace
> > > dump_stack
> > > 
> > > That's it, no hint of a system call...
> > 
> > 	hm, gak.  does anything change if you don't set the ->layout_get() 
> > or ->layout_return() export ops?  that is, can you see the client mount 
> > and do GETDEVICELIST and GETDEVICEINFO and, thereafter, cat a file or 
> > something and see it start doing layout processing, fail when nothing's 
> > available, and then do a plain-vanilla READ?
> 
> That would be worth a try. I did try this change in
> filelayout_initialize_mountpoint:
> 
> #ifdef REMOVED_CODE
> 	/* Retrieve device list from server */
> 	status = pnfs_callback_ops->nfs_getdevicelist(sb, fh, dlist);
> 	if (status)
> 		goto cleanup_mt;
> 
> 	status = nfs4_pnfs_devlist_init(fl_mt->hlist);
> 	if (status)
> 		goto cleanup_mt;
> 
> 	/* Retrieve and add all available devices */
> 	status = process_deviceid_list(fl_mt, fh, dlist);
> 	if (status)
> 		goto cleanup_mt;
> #endif /* REMOVED_CODE */
> 
> Now it doesn't crash (as badly). When I write to a file, I do see layout
> get being called, but the write ends up hanging. Need to look through
> all the debug messages and piece out what went wrong. Hopefully that
> will hold a clue as to what was crashing before.

Hmm, here's a hint of where the hang might be:

May  5 12:44:49 elm3a22 kernel: BUG: soft lockup - CPU#1 stuck for 61s! [bash:2649]
May  5 12:44:49 elm3a22 kernel: 
May  5 12:44:49 elm3a22 kernel: Pid: 2649, comm: bash Not tainted (2.6.25-rc8-pnfs-ff #2)
May  5 12:44:49 elm3a22 kernel: EIP: 0060:[<c061fc52>] EFLAGS: 00000283 CPU: 1
May  5 12:44:49 elm3a22 kernel: EIP is at __write_lock_failed+0xa/0x20
May  5 12:44:49 elm3a22 kernel: EAX: f706ea00 EBX: 16422f09 ECX: 00000000 EDX: f7801060
May  5 12:44:49 elm3a22 kernel: ESI: f684fa20 EDI: f706ea00 EBP: f7137be0 ESP: f7137be0
May  5 12:44:49 elm3a22 kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
May  5 12:44:49 elm3a22 kernel: CR0: 8005003b CR2: b7f6c000 CR3: 372f9000 CR4: 000006d0
May  5 12:44:49 elm3a22 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
May  5 12:44:49 elm3a22 kernel: DR6: ffff0ff0 DR7: 00000400
May  5 12:44:49 elm3a22 kernel:  [<c061feb8>] _write_lock+0x11/0x13
May  5 12:44:49 elm3a22 kernel:  [<f8b4c480>] nfs4_pnfs_device_item_get+0x42a/0x9b1 [nfslayoutdriver]
May  5 12:44:49 elm3a22 kernel:  [<c04de8af>] ? __next_cpu+0x15/0x25
May  5 12:44:49 elm3a22 kernel:  [<c041ac50>] ? find_busiest_group+0x1de/0x593
May  5 12:44:49 elm3a22 kernel:  [<c0419f79>] ? update_curr+0x87/0xa3
May  5 12:44:49 elm3a22 kernel:  [<c04e2535>] ? number+0x10d/0x1cd
May  5 12:44:49 elm3a22 kernel:  [<c0621b15>] ? __atomic_notifier_call_chain+0x11/0x13
May  5 12:44:49 elm3a22 kernel:  [<c0621b23>] ? atomic_notifier_call_chain+0xc/0xe
May  5 12:44:49 elm3a22 kernel:  [<c0537f55>] ? notify_update+0x22/0x24
May  5 12:44:49 elm3a22 kernel:  [<c053bea1>] ? vt_console_print+0x21f/0x22e
May  5 12:44:49 elm3a22 kernel:  [<c053bc82>] ? vt_console_print+0x0/0x22e
May  5 12:44:49 elm3a22 kernel:  [<c0423f5d>] ? __call_console_drivers+0x56/0x63
May  5 12:44:49 elm3a22 kernel:  [<c0423fc1>] ? _call_console_drivers+0x57/0x5b
May  5 12:44:49 elm3a22 kernel:  [<f8b4ca53>] nfs4_pnfs_dserver_get+0x4c/0x1b6 [nfslayoutdriver]
May  5 12:44:49 elm3a22 kernel:  [<c0424838>] ? printk+0x15/0x17
May  5 12:44:49 elm3a22 kernel:  [<f8b4bc9f>] filelayout_flush_one+0x2d7/0x2f0 [nfslayoutdriver]
May  5 12:44:49 elm3a22 kernel:  [<f8bf309d>] pnfs_flush_one+0x6b/0xb2 [nfs]
May  5 12:44:49 elm3a22 kernel:  [<f8bd3651>] nfs_pageio_doio+0x27/0x4f [nfs]
May  5 12:44:49 elm3a22 kernel:  [<f8bd3681>] nfs_pageio_complete+0x8/0xa [nfs]
May  5 12:44:49 elm3a22 kernel:  [<f8bd6c12>] nfs_writepages+0x5c/0x75 [nfs]
May  5 12:44:49 elm3a22 kernel:  [<f8bf3032>] ? pnfs_flush_one+0x0/0xb2 [nfs]
May  5 12:44:49 elm3a22 kernel:  [<f8bd70fc>] __nfs_write_mapping+0x15/0x43 [nfs]
May  5 12:44:49 elm3a22 kernel:  [<f8bd7163>] nfs_write_mapping+0x39/0x57 [nfs]
May  5 12:44:49 elm3a22 kernel:  [<f8bd71a6>] nfs_wb_all+0x10/0x12 [nfs]
May  5 12:44:49 elm3a22 kernel:  [<f8bcdd32>] nfs_do_fsync+0x16/0x31 [nfs]
May  5 12:44:49 elm3a22 kernel:  [<f8bce130>] nfs_file_flush+0x6c/0x8f [nfs]
May  5 12:44:49 elm3a22 kernel:  [<c0469c8d>] filp_close+0x31/0x5a
May  5 12:44:49 elm3a22 kernel:  [<c0475554>] sys_dup2+0xd6/0x100
May  5 12:44:49 elm3a22 kernel:  [<c040490e>] sysenter_past_esp+0x5f/0x85
May  5 12:44:49 elm3a22 kernel:  [<c0620000>] ? _read_lock_irqsave+0xf/0x15
May  5 12:44:49 elm3a22 kernel:  =======================


Frank





More information about the pNFS mailing list