[pnfs] [PATCH 2/2] NFS: Fix a pNFS deadlock due to violations of RPC asynchronous i/o rules
Benny Halevy
bhalevy at panasas.com
Tue Jul 15 10:14:26 EDT 2008
On Jul. 15, 2008, 16:35 +0300, Trond Myklebust <Trond.Myklebust at netapp.com> wrote:
> On Tue, 2008-07-15 at 09:41 +0300, Benny Halevy wrote:
>>> + /* FIXME: This lock is just so wrong... */
>>> + spin_lock(&nfsi->lo_lock);
>> What happens here is that the lo_lock was dropped in
>> pnfs_update_layout before calling get_layout that's
>> doing kmalloc and an rpc hence it must not hold the spin
>> lock. We reacquire it here for pnfs_inject_layout that
>> inserts the retrieved layout onto the file's layout.
>
> ...and hence you end up holding it across an allocation inside
> pnfs_inject_layout(). It would be far better to move the lock to where
> it is needed inside pnfs_inject_layout() _after_ the allocation has been
> made.
we actually drop it yet again and reacquire across the call to
PNFS_LD_IO_OPS(lo)->alloc_lseg, but I completely agree that lock isn't
really needed before the call to pnfs_inject_layout.
Since pnfs_inject_layout is called only from pnfs_layout_process I think
the simplest solution is just embed it in pnfs_layout_process and
grab the lock right before calling pnfs_insert_layout.
>
>>> +
>>> + /* FIXME: WTF? */
>>> + BUG_ON(nfsi->current_layout != lo);
>> While dropping the spin lock we do keep a reference count
>> on the layout structure to keep it around and valid and the
>> BUG_ON is here to check we're sane since current_layout is
>> not supposed to change while referenced (in the present
>> implementation). That said, I've no objection to remove
>> the BUG_ON since I believe this code should work even
>> this assumption is changed and we might be referencing
>> a layout structure detached from the nfs_inode.
>
> So who is holding the reference? Wouldn't it make sense to hold a
> reference in the 'struct nfs4_pnfs_layoutget' itself, so that you don't
> have to worry about sanity checks like the above?
>
The reference count is currently part of struct pnfs_layout_type.
It's taken in get_lock_current_layout that's called from the
pnfs_update_layout/ get_lock_alloc_layout path in this particular case.
Dereferencing takes place in put_unlock_current_layout called
from nfs4_pnfs_layoutget_release/pnfs_layout_release.
I interpret your suggestion as keeping the point to the layout
in struct nfs4_pnfs_layoutget while working on it and I like
that very much, also for this reason in nfs4_pnfs_layoutget_release:
lo = nfsi->current_layout;
BUG_ON(!lo);
dprintk("--> %s\n", __func__);
pnfs_layout_release(lo);
Benny
More information about the pNFS
mailing list