[pnfs] [PATCH 2/2] NFS: Fix a pNFS deadlock due to violations of RPC asynchronous i/o rules

Benny Halevy bhalevy at panasas.com
Tue Jul 15 10:14:26 EDT 2008


On Jul. 15, 2008, 16:35 +0300, Trond Myklebust <Trond.Myklebust at netapp.com> wrote:
> On Tue, 2008-07-15 at 09:41 +0300, Benny Halevy wrote:
>>> +	/* FIXME: This lock is just so wrong... */
>>> +	spin_lock(&nfsi->lo_lock);
>> What happens here is that the lo_lock was dropped in
>> pnfs_update_layout before calling get_layout that's
>> doing kmalloc and an rpc hence it must not hold the spin
>> lock.  We reacquire it here for pnfs_inject_layout that
>> inserts the retrieved layout onto the file's layout.
> 
> ...and hence you end up holding it across an allocation inside
> pnfs_inject_layout(). It would be far better to move the lock to where
> it is needed inside pnfs_inject_layout() _after_ the allocation has been
> made.

we actually drop it yet again and reacquire across the call to
PNFS_LD_IO_OPS(lo)->alloc_lseg, but I completely agree that lock isn't
really needed before the call to pnfs_inject_layout.

Since pnfs_inject_layout is called only from pnfs_layout_process I think
the simplest solution is just embed it in pnfs_layout_process and
grab the lock right before calling pnfs_insert_layout.

> 
>>> +
>>> +	/* FIXME: WTF? */
>>> +	BUG_ON(nfsi->current_layout != lo);
>> While dropping the spin lock we do keep a reference count
>> on the layout structure to keep it around and valid and the
>> BUG_ON is here to check we're sane since current_layout is
>> not supposed to change while referenced (in the present
>> implementation).  That said, I've no objection to remove
>> the BUG_ON since I believe this code should work even
>> this assumption is changed and we might be referencing
>> a layout structure detached from the nfs_inode.
> 
> So who is holding the reference? Wouldn't it make sense to hold a
> reference in the 'struct nfs4_pnfs_layoutget' itself, so that you don't
> have to worry about sanity checks like the above?
> 

The reference count is currently part of struct pnfs_layout_type.
It's taken in get_lock_current_layout that's called from the
pnfs_update_layout/ get_lock_alloc_layout path in this particular case.

Dereferencing takes place in put_unlock_current_layout called
from nfs4_pnfs_layoutget_release/pnfs_layout_release.

I interpret your suggestion as keeping the point to the layout
in struct nfs4_pnfs_layoutget while working on it and I like
that very much, also for this reason in nfs4_pnfs_layoutget_release:
        lo = nfsi->current_layout;
        BUG_ON(!lo);

        dprintk("--> %s\n", __func__);
        pnfs_layout_release(lo);

Benny


More information about the pNFS mailing list