[pnfs] [PATCH 6/6] pnfs: reference pnfs_layout_type across lseg life
Andy Adamson
andros at netapp.com
Mon Jul 28 10:23:51 EDT 2008
On Sun, 2008-07-27 at 20:16 +0300, Benny Halevy wrote:
> On Wed, 23 Jul 2008 15:05:14 -0400 andros at netapp.com wrote:
> > From: Andy Adamson <andros at localhost.localdomain>
> >
> > Now that the pnfs_layout_type spin lock and refcount are separate, hold a
> > reference to the pnfs_layout_type across the life of the lseg.
> > This guarantees that the pnfs_layout_type will not be reaped during pnfs I/O.
> >
> > We also do not need to take an extra reference in get_lock_alloc_layout
> > and then check for the empty lseg list in _pnfs_return_layout.
> >
> > Signed-off-by: Andy Adamson<andros at netapp.com>
> > ---
> > fs/nfs/nfs4proc.c | 9 ---------
> > fs/nfs/pnfs.c | 7 +++++--
> > 2 files changed, 5 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 29454b4..cb47549 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -5352,15 +5352,6 @@ static void nfs4_pnfs_layoutreturn_release(void *calldata)
> > dprintk("--> %s nfsi->current_layout %p\n", __func__,
> > nfsi->current_layout);
> > pnfs_layout_release(lrp->lo);
> > -
> > - /* FIX-ME this put balances extra kref_get in get_lock_alloc_layout,
> > - * Proper refcounting should remove both list_empty check and this put.
> > - */
> > - if (list_empty(&lrp->lo->segs)) {
> > - dprintk("%s call: pnfs_layout_release LISTEMPTY refcount %d\n",
> > - __func__, atomic_read(&lrp->lo->refcount.refcount));
> > - pnfs_layout_release(lrp->lo);
> > - }
> > kfree(calldata);
> > dprintk("<-- %s\n", __func__);
> > }
> > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > index e629984..e85b727 100644
> > --- a/fs/nfs/pnfs.c
> > +++ b/fs/nfs/pnfs.c
> > @@ -383,6 +383,7 @@ init_lseg(struct pnfs_layout_type *lo, struct pnfs_layout_segment *lseg)
> > INIT_LIST_HEAD(&lseg->fi_list);
> > kref_init(&lseg->kref);
> > lseg->layout = lo;
> > + kref_get(&lseg->layout->refcount);
> > }
> >
> > static void
> > @@ -390,9 +391,12 @@ destroy_lseg(struct kref *kref)
> > {
> > struct pnfs_layout_segment *lseg =
> > container_of(kref, struct pnfs_layout_segment, kref);
> > + struct pnfs_layout_type *lo = lseg->layout;
> >
> > - dprintk("--> %s\n", __func__);
> > + dprintk("--> %s lo->refcount %d\n", __func__,
> > + atomic_read(&lo->refcount.refcount));
> > PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
> > + put_current_layout(lo);
> > }
> >
> > static inline void
> > @@ -803,7 +807,6 @@ get_lock_alloc_layout(struct inode *ino,
> > if (list_empty(&nfsi->lo_inodes))
> > list_add_tail(&nfsi->lo_inodes, &clp->cl_lo_inodes);
> > up_write(&clp->cl_sem);
> > - kref_get(&lo->refcount);
>
> I'm really nervous about this.
As well you should be, I got it wrong. It took me a while to see why -
the race to create/destroy the current_layout if the lo_lock is not
taken in the put routine.
> My intuition is that you may want to get the reference count
> here too, as well when referencing an lseg.
Yes, I agree.
> Though the original design left the latter out as I used
> the list of lsegs effectively as a reference "count"
> on the layout structure, as it was never destroyed until
> the list was emptied.
True, but the list could be emptied with a LAYOUTRETURN while the lseg
was being used for I/O, which is why I want a reference across the life
of the lseg. This is all tied into the FIXME comment in
fs/nfs/callback_proc.c concerning responding to a CB_LAYOUTRECALL:
/* FIXME: need barrier here:
pause I/O to data servers
pause layoutgets
drain all outstanding writes to storage devices
wait for any outstanding layoutreturns and layoutgets mentioned in
cb_sequence.
then return layouts, resume after layoutreturns complete
*/
Waiting for I/O to drain prior to a LAYOUTRETURN (in response to a
recall or not) would also solve the problem.
>
> I'm OK with incrementing the reference count also for lsegs
> but this should be equivalent to the list method.
> I.e. when the layout is destroyed the list MUST be empty.
I agree.
What would you think of using an rcu list (as is used in the delegation
code) instead of the spin_lock and empty list check? It allows readers
to run in parallel both with each other and with updaters and (I think)
would remove the need for taking the cl_sem to update the nfs_client
cl_lo_inodes list.
Anyway, I think the first two patches are useful (caveat your comments)
0001-pnfs-fix-commit-lseg-reference-counting.patch
0002-pnfs-Remove-check-for-RECALL_FILE-from-layout-retur.patch
I don't think these last four improve the situation, please don't apply.
0003-pnfs-use-kref-for-pnfs_layout_type-reference-counti.patch
0004-pnfs-remove-spin_unlock-from-put_unlock_current_lay.patch
0005-pnfs-don-t-call-put_lseg-with-lo_lock-held.patch
0006-pnfs-reference-pnfs_layout_type-across-lseg-life.patch
Thanks for the review.
-->Andy
>
> Benny
>
> > } else
> > lo = ERR_PTR(-ENOMEM);
> >
More information about the pNFS
mailing list