[pnfs] [PATCH 4/4] nfs41: use a sorted free list for sessions slots

Benny Halevy bhalevy at panasas.com
Mon Mar 17 05:07:32 EDT 2008


On Mar. 17, 2008, 1:27 +0200, Brent Welch <welch at panasas.com> wrote:
> Hmm - instead of lists or bitmaps, I would suggest just keeping
> track of the lowest free slot, and highest used slot. (I don't
> think you need the highest free slot, but you might.)  These are
> cheap to maintain, and they give what you need to convey to the
> server, right?  Each time you allocate the lowest free slot you
> scan up to the next free slot, which is O(n).  Freeing a slot you
> maintain this with O(1).  Maintaining the highest used slot
> requires a scan as well whenever you free the highest slot.

Right.

> 
> Before you implement something more complicated, I'd like to
> see evidence that you are burning cycles in this code.

I don't see a big difference in complexity between maintaining
the per-slot state in a bit per slot structure which wastes memory
and dcache vs. keeping it in an efficient bitmap.  The algorithm is
essentially the same...

Benny

> 
>>>> Benny Halevy said:
>  > 
>  > On Mar. 16, 2008, 21:50 +0200, Dean Hildebrand <seattleplus at gmail.com> wrote
>      :
>  > > 
>  > > Benny Halevy wrote:
>  > >> On Mar. 15, 2008, 0:08 +0200, Dean Hildebrand <seattleplus at gmail.com> wro
>      te:
>  > >>   
>  > >>> Hi Benny,
>  > >>>
>  > >>> Could you add some comments to the code to explain your new algorithm 
>  > >>> and logic?  Especially explaining the purpose of each loop and why we 
>  > >>> need to track the highest slot id.
>  > >>>     
>  > >> Sure.  no problem.
>  > >> By the way, I thought about this more over the weekend, and managing the
>  > >> free slots can be made even easier with a bitmap.  Up to 64 slots can be
>  > >> covered trivially and efficiently with a single unsigned long integer and
>  > >> then the only remaining data we need to keep per-slot (for now) is its se
>      qid.
>  > >>   
>  > > Would that still require searching through the 64 slots (bits)?  We 
>  > > should be able to do it in O(1) if we use a list and keep track of the 
>  > > pointer to the slot for each request, non?
>  > 
>  > I'm not sure what use case you have in mind.
>  > The one I'm thinking of is keeping the highest slot id in use.
>  > when a slot becomes free you need to track which used now
>  > has the highest slot ID.  This is done to provide feedback to the server
>  > so it can adjust and possibly rebalance the resources allocated for each cli
>      ent.
>  > If you want to do that in O(1) with a list keeping all used slots, the list
>  > needs to be ordered, which requires O(n) for each insert :-(
>  > 
>  > With the proposed bitmap, scanning can cost O(n) but the actual complexity c
>      onstant
>  > is very low comparing to scanning a list when you take into account memory
>  > access penalties and caching.
>  > 
>  > Note 1: Some architectures (x86 in particular :) provide for cpu instruction
>      s
>  > that find the first or last bit set in a word in O(1)
>  > 
>  > Note 2: This does not have to be done in real time and can be updated period
>      ically.
>  > 
>  > Benny
>  > 
>  > > 
>  > > Dean
>  > >>   
>  > >>> Dean
>  > >>>
>  > >>> Benny Halevy wrote:
>  > >>>     
>  > >>>> Also,
>  > >>>> * Maintain highest_slotid correctly.
>  > >>>> * sleep for a free slot only if failed to allocate one.
>  > >>>>
>  > >>>> Signed-off-by: Benny Halevy <bhalevy at panasas.com>
>  > >>>> ---
>  > >>>>  fs/nfs/nfs4proc.c            |  104 +++++++++++++++++++++++++++-------
>      --------
>  > >>>>  include/linux/nfs4_session.h |    9 +--
>  > >>>>  2 files changed, 70 insertions(+), 43 deletions(-)
>  > >>>>
>  > >>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>  > >>>> index 0203afe..d0e9c5a 100644
>  > >>>> --- a/fs/nfs/nfs4proc.c
>  > >>>> +++ b/fs/nfs/nfs4proc.c
>  > >>>> @@ -210,6 +210,43 @@ static void renew_lease(const struct nfs_server *s
>      erver, unsigned long timestamp
>  > >>>>  }
>  > >>>>  
>  > >>>>  #if defined(CONFIG_NFS_V4_1)
>  > >>>> +
>  > >>>> +static inline void __free_slot_sorted(struct nfs4_slot_table *tbl,
>  > >>>> +				      struct nfs4_slot *slot)
>  > >>>> +{
>  > >>>> +	int slot_nr = slot->slot_nr;
>  > >>>> +	struct nfs4_slot *sp;
>  > >>>> +
>  > >>>> +	list_for_each_entry (sp, &tbl->free_slots, entry)
>  > >>>> +		if (slot_nr < sp->slot_nr) {
>  > >>>> +			list_add_tail(&slot->entry, &sp->entry);
>  > >>>> +			return;
>  > >>>> +		}
>  > >>>> +	list_add_tail(&slot->entry, &tbl->free_slots);
>  > >>>> +}
>  > >>>> +
>  > >>>> +static inline void __nfs41_free_slot(struct nfs4_slot_table *tbl,
>  > >>>> +				     struct nfs4_slot *slot)
>  > >>>> +{
>  > >>>> +	int slot_nr;
>  > >>>> +	struct nfs4_slot *sp;
>  > >>>> +
>  > >>>> +	spin_lock(&tbl->slot_tbl_lock);
>  > >>>> +	__free_slot_sorted(tbl, slot);
>  > >>>> +	slot_nr = slot->slot_nr;
>  > >>>> +	if (slot_nr == tbl->highest_slotid) {
>  > >>>> +		slot_nr--;
>  > >>>> +		list_for_each_entry_reverse(sp, &tbl->free_slots, entry
>      ) {
>  > >>>> +			if (sp->slot_nr != slot_nr)
>  > >>>> +				break;
>  > >>>> +			slot_nr--;
>  > >>>> +		}
>  > >>>> +		tbl->highest_slotid = slot_nr;
>  > >>>> +	}
>  > >>>> +	spin_unlock(&tbl->slot_tbl_lock);
>  > >>>> +	rpc_wake_up_next(&tbl->slot_tbl_waitq);
>  > >>>> +}
>  > >>>> +
>  > >>>>  /* For pNFS filelayout data servers:
>  > >>>>   * the nfs_client is NULL - to signal no lease update.
>  > >>>>   * session is the data server.
>  > >>>> @@ -235,7 +272,7 @@ static int nfs41_sequence_done(struct nfs_client *c
>      lp,
>  > >>>>  	slot = res->sr_slot;
>  > >>>>  	if (slot == NULL) {
>  > >>>>  		dprintk("%s: no slot: status %d\n", __func__, status);
>  > >>>> -		goto out;	/* session recovery probably failed */
>  > >>>> +		goto ret;	/* session recovery probably failed */
>  > >>>>  	}
>  > >>>>  
>  > >>>>  	switch (status) {
>  > >>>> @@ -267,12 +304,7 @@ static int nfs41_sequence_done(struct nfs_client *
>      clp,
>  > >>>>  		spin_unlock(&clp->cl_lock);
>  > >>>>  	}
>  > >>>>  no_update:
>  > >>>> -	/* Clear the 'busy' bit on the slot that was used */
>  > >>>> -	smp_mb__before_clear_bit();
>  > >>>> -	clear_bit(NFS4_SLOT_BUSY, &slot->flags);
>  > >>>> -	smp_mb__after_clear_bit();
>  > >>>> -out:
>  > >>>> -	rpc_wake_up_next(&tbl->slot_tbl_waitq);
>  > >>>> +	__nfs41_free_slot(tbl, slot);
>  > >>>>  ret:
>  > >>>>  	return status;
>  > >>>>  }
>  > >>>> @@ -299,31 +331,28 @@ static int nfs4_sequence_done(struct nfs_server *
>      server,
>  > >>>>  	return ret;
>  > >>>>  }
>  > >>>>  
>  > >>>> -static struct nfs4_slot *__nfs4_find_slot(struct nfs4_slot_table *tbl)
>  > >>>> +static inline struct nfs4_slot *nfs4_find_slot(struct nfs4_slot_table 
>      *tbl,
>  > >>>> +					       struct rpc_task *task,
>  > >>>> +					       struct nfs41_sequence_ar
>      gs *args)
>  > >>>>  {
>  > >>>> -	int i;
>  > >>>> +	int slot_nr;
>  > >>>>  	struct nfs4_slot *slot;
>  > >>>>  
>  > >>>> -	for (i = 0; i < tbl->max_slots; ++i) {
>  > >>>> -		slot = &tbl->slots[i];
>  > >>>> -		if (!test_bit(NFS4_SLOT_BUSY, &slot->flags)) {
>  > >>>> -			set_bit(NFS4_SLOT_BUSY, &slot->flags);
>  > >>>> -			return slot;
>  > >>>> -		}
>  > >>>> -	}
>  > >>>> -
>  > >>>> -	return NULL;
>  > >>>> -}
>  > >>>> -
>  > >>>> -static struct nfs4_slot *nfs4_find_slot(struct nfs4_slot_table *tbl,
>  > >>>> -					struct rpc_task *task)
>  > >>>> -{
>  > >>>> -	struct nfs4_slot *slot;
>  > >>>> -
>  > >>>> -	rpc_sleep_on(&tbl->slot_tbl_waitq, task, NULL, NULL);
>  > >>>> -
>  > >>>>  	spin_lock(&tbl->slot_tbl_lock);
>  > >>>> -	slot = __nfs4_find_slot(tbl);
>  > >>>> +	if (likely(!list_empty(&tbl->free_slots))) {
>  > >>>> +		slot = list_first_entry(&tbl->free_slots, struct nfs4_s
>      lot,
>  > >>>> +					entry);
>  > >>>> +		list_del(&slot->entry);
>  > >>>> +		slot_nr = slot->slot_nr;
>  > >>>> +		if (tbl->highest_slotid < slot_nr)
>  > >>>> +			tbl->highest_slotid = slot_nr;
>  > >>>> +		args->sa_slotid = slot_nr;
>  > >>>> +		args->sa_seqid = slot->seq_nr;
>  > >>>> +		args->sa_max_slotid = tbl->highest_slotid;
>  > >>>> +	} else {
>  > >>>> +		slot = NULL;
>  > >>>> +		rpc_sleep_on(&tbl->slot_tbl_waitq, task, NULL, NULL);
>  > >>>> +	}
>  > >>>>  	spin_unlock(&tbl->slot_tbl_lock);
>  > >>>>  
>  > >>>>  	return slot;
>  > >>>> @@ -338,21 +367,21 @@ static int nfs41_setup_sequence(struct nfs4_sessi
>      on *session,
>  > >>>>  	struct nfs4_slot *slot;
>  > >>>>  	struct nfs4_slot_table *tbl;
>  > >>>>  
>  > >>>> +	dprintk("--> %s\n", __func__);
>  > >>>>  	tbl = &session->fore_channel.slot_table;
>  > >>>> -	slot = nfs4_find_slot(tbl, task);
>  > >>>> +	slot = nfs4_find_slot(tbl, task, args);
>  > >>>>  
>  > >>>> -	if (!slot)
>  > >>>> +	if (!slot) {
>  > >>>> +		dprintk("<-- %s: no free slots\n", __func__);
>  > >>>>  		return -EAGAIN;
>  > >>>> +	}
>  > >>>>  
>  > >>>>  	memcpy(args->sa_sessionid.data, session->sess_id,
>  > >>>>  	       NFS4_MAX_SESSIONID_LEN);
>  > >>>> -	args->sa_slotid = slot->slot_nr;
>  > >>>> -	args->sa_seqid = slot->seq_nr;
>  > >>>>  	args->sa_cache_this = cache_reply;
>  > >>>>  
>  > >>>> -	spin_lock(&tbl->slot_tbl_lock);
>  > >>>> -	args->sa_max_slotid = tbl->max_slots;
>  > >>>> -	spin_unlock(&tbl->slot_tbl_lock);
>  > >>>> +	dprintk("<-- %s slot=%p slotid=%d seqid=%d\n", __func__,
>  > >>>> +		slot, args->sa_slotid, slot->seq_nr);
>  > >>>>  
>  > >>>>  	res->sr_slot = slot;
>  > >>>>  	res->sr_renewal_time = jiffies;
>  > >>>> @@ -4541,11 +4570,11 @@ int nfs4_init_slot_table(struct nfs4_channel *c
>      hannel)
>  > >>>>  	dprintk("%s: slots=%p max_slots=%d\n", __func__, 
>  > >>>>  		tbl->slots, tbl->max_slots);
>  > >>>>  
>  > >>>> -	for (i = 0; i < channel->chan_attrs.max_reqs; ++i) {
>  > >>>> +	for (i = 0; i < channel->chan_attrs.max_reqs - 1; ++i) {
>  > >>>>  		slot = &tbl->slots[i];
>  > >>>>  		slot->slot_nr = i;
>  > >>>>  		slot->seq_nr = 1;
>  > >>>> -		slot->flags = 0;
>  > >>>> +		list_add_tail(&slot->entry, &tbl->free_slots);
>  > >>>>  	}
>  > >>>>  
>  > >>>>  out:
>  > >>>> @@ -4573,6 +4602,7 @@ static int nfs4_init_channel(struct nfs4_channel 
>      *channel)
>  > >>>>  	tbl = &channel->slot_table;
>  > >>>>  
>  > >>>>  	spin_lock_init(&tbl->slot_tbl_lock);
>  > >>>> +	INIT_LIST_HEAD(&tbl->free_slots);
>  > >>>>  	rpc_init_wait_queue(&tbl->slot_tbl_waitq, "Slot table");
>  > >>>>  
>  > >>>>  	return 0;
>  > >>>> diff --git a/include/linux/nfs4_session.h b/include/linux/nfs4_session.
>      h
>  > >>>> index 8ef8d6d..542d099 100644
>  > >>>> --- a/include/linux/nfs4_session.h
>  > >>>> +++ b/include/linux/nfs4_session.h
>  > >>>> @@ -7,11 +7,6 @@
>  > >>>>  #include <linux/smp_lock.h>
>  > >>>>  #include <linux/sunrpc/sched.h>
>  > >>>>  
>  > >>>> -/* The flags for the nfs4_slot struct */
>  > >>>> -#define NFS4_SLOT_BUSY		0X0	/* Slot in use */
>  > >>>> -#define NFS4_SLOT_RECLAIMED	0x1	/* Slot has been reclaimed by
>  > >>>> -					   the server */
>  > >>>> -
>  > >>>>  struct nfs4_channel_attrs {
>  > >>>>  	u32			headerpadsz;
>  > >>>>  	u32			max_rqst_sz;
>  > >>>> @@ -25,14 +20,16 @@ struct nfs4_channel_attrs {
>  > >>>>  struct nfs4_slot {
>  > >>>>  	u32		 	slot_nr;
>  > >>>>  	u32		 	seq_nr;
>  > >>>> -	unsigned long		flags;
>  > >>>> +	struct list_head 	entry;
>  > >>>>  };
>  > >>>>  
>  > >>>>  struct nfs4_slot_table {
>  > >>>>  	struct nfs4_slot 	*slots;
>  > >>>>  	spinlock_t		slot_tbl_lock;
>  > >>>> +	struct list_head 	free_slots;
>  > >>>>  	struct rpc_wait_queue	slot_tbl_waitq;
>  > >>>>  	int			max_slots;
>  > >>>> +	int			highest_slotid;
>  > >>>>  };
>  > >>>>  
>  > >>>>  struct nfs4_channel {
>  > >>>>   
>  > >>>>       
>  > >>   
>  > 
>  > _______________________________________________
>  > pNFS mailing list
>  > pNFS at linux-nfs.org
>  > http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>  > 
>  > 
> 
> --
> Brent Welch
> Director, Software Architecture, Panasas Inc
> The Leader in Parallel Storage
> 
> www.panasas.com
> welch at panasas.com



More information about the pNFS mailing list