[pnfs] [PATCH 6/6] pnfs: reference pnfs_layout_type across lseg life

Andy Adamson andros at netapp.com
Mon Jul 28 10:23:51 EDT 2008


On Sun, 2008-07-27 at 20:16 +0300, Benny Halevy wrote:
> On Wed, 23 Jul 2008 15:05:14 -0400 andros at netapp.com wrote:
> > From: Andy Adamson <andros at localhost.localdomain>
> > 
> > Now that the pnfs_layout_type spin lock and refcount are separate, hold a
> > reference to the pnfs_layout_type across the life of the lseg.
> > This guarantees that the pnfs_layout_type will not be reaped during pnfs I/O.
> > 
> > We also do not need to take an extra reference in get_lock_alloc_layout
> > and then check for the empty lseg list in _pnfs_return_layout.
> > 
> > Signed-off-by: Andy Adamson<andros at netapp.com>
> > ---
> >  fs/nfs/nfs4proc.c |    9 ---------
> >  fs/nfs/pnfs.c     |    7 +++++--
> >  2 files changed, 5 insertions(+), 11 deletions(-)
> > 
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 29454b4..cb47549 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -5352,15 +5352,6 @@ static void nfs4_pnfs_layoutreturn_release(void *calldata)
> >  	dprintk("--> %s nfsi->current_layout %p\n", __func__,
> >  					nfsi->current_layout);
> >  	pnfs_layout_release(lrp->lo);
> > -
> > -	/* FIX-ME this put balances extra kref_get in get_lock_alloc_layout,
> > -	* Proper refcounting should remove both list_empty check and this put.
> > -	*/
> > -	if (list_empty(&lrp->lo->segs)) {
> > -		dprintk("%s call: pnfs_layout_release LISTEMPTY refcount %d\n",
> > -			__func__, atomic_read(&lrp->lo->refcount.refcount));
> > -		pnfs_layout_release(lrp->lo);
> > -	}
> >  	kfree(calldata);
> >  	dprintk("<-- %s\n", __func__);
> >  }
> > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > index e629984..e85b727 100644
> > --- a/fs/nfs/pnfs.c
> > +++ b/fs/nfs/pnfs.c
> > @@ -383,6 +383,7 @@ init_lseg(struct pnfs_layout_type *lo, struct pnfs_layout_segment *lseg)
> >  	INIT_LIST_HEAD(&lseg->fi_list);
> >  	kref_init(&lseg->kref);
> >  	lseg->layout = lo;
> > +	kref_get(&lseg->layout->refcount);
> >  }
> >  
> >  static void
> > @@ -390,9 +391,12 @@ destroy_lseg(struct kref *kref)
> >  {
> >  	struct pnfs_layout_segment *lseg =
> >  		container_of(kref, struct pnfs_layout_segment, kref);
> > +	struct pnfs_layout_type *lo = lseg->layout;
> >  
> > -	dprintk("--> %s\n", __func__);
> > +	dprintk("--> %s lo->refcount %d\n", __func__,
> > +		atomic_read(&lo->refcount.refcount));
> >  	PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
> > +	put_current_layout(lo);
> >  }
> >  
> >  static inline void
> > @@ -803,7 +807,6 @@ get_lock_alloc_layout(struct inode *ino,
> >  			if (list_empty(&nfsi->lo_inodes))
> >  				list_add_tail(&nfsi->lo_inodes, &clp->cl_lo_inodes);
> >  			up_write(&clp->cl_sem);
> > -			kref_get(&lo->refcount);
> 
> I'm really nervous about this.

As well you should be, I got it wrong. It took me a while to see why -
the race to create/destroy the current_layout if the lo_lock is not
taken in the put routine.

> My intuition is that you may want to get the reference count
> here too, as well when referencing an lseg.

Yes, I agree.

> Though the original design left the latter out as I used
> the list of lsegs effectively as a reference "count"
> on the layout structure, as it was never destroyed until
> the list was emptied.

True, but the list could be emptied with a LAYOUTRETURN while the lseg
was being used for I/O, which is why I want a reference across the life
of the lseg. This is all tied into the FIXME comment in
fs/nfs/callback_proc.c concerning responding to a CB_LAYOUTRECALL:

/* FIXME: need barrier here:
   pause I/O to data servers
   pause layoutgets
   drain all outstanding writes to storage devices
   wait for any outstanding layoutreturns and layoutgets mentioned in
   cb_sequence.
   then return layouts, resume after layoutreturns complete
 */

Waiting for I/O to drain prior to a LAYOUTRETURN (in response to a
recall or not) would also solve the problem.

> 
> I'm OK with incrementing the reference count also for lsegs
> but this should be equivalent to the list method.
> I.e. when the layout is destroyed the list MUST be empty.

I agree.

What would you think of using an rcu list (as is used in the delegation
code) instead of the spin_lock and empty list check? It allows readers
to run in parallel both with each other and with updaters and (I think)
would remove the need for taking the cl_sem to update the nfs_client
cl_lo_inodes list.

Anyway, I think the first two patches are useful (caveat your comments)

0001-pnfs-fix-commit-lseg-reference-counting.patch
0002-pnfs-Remove-check-for-RECALL_FILE-from-layout-retur.patch

I don't think these last four improve the situation, please don't apply.

0003-pnfs-use-kref-for-pnfs_layout_type-reference-counti.patch
0004-pnfs-remove-spin_unlock-from-put_unlock_current_lay.patch
0005-pnfs-don-t-call-put_lseg-with-lo_lock-held.patch
0006-pnfs-reference-pnfs_layout_type-across-lseg-life.patch

Thanks for the review.

-->Andy
> 
> Benny
> 
> >  		} else
> >  			lo = ERR_PTR(-ENOMEM);
> >  


More information about the pNFS mailing list