[pnfs] [PATCH 4/4] nfs41: use a sorted free list for sessions slots

Benny Halevy bhalevy at panasas.com
Mon Mar 17 13:55:56 EDT 2008


On Mar. 17, 2008, 19:39 +0200, Dean Hildebrand <seattleplus at gmail.com> wrote:
> 
> Benny Halevy wrote:
>> On Mar. 17, 2008, 1:27 +0200, Brent Welch <welch at panasas.com> wrote:
>>   
>>> Hmm - instead of lists or bitmaps, I would suggest just keeping
>>> track of the lowest free slot, and highest used slot. (I don't
>>> think you need the highest free slot, but you might.)  These are
>>> cheap to maintain, and they give what you need to convey to the
>>> server, right?  Each time you allocate the lowest free slot you
>>> scan up to the next free slot, which is O(n).  Freeing a slot you
>>> maintain this with O(1).  Maintaining the highest used slot
>>> requires a scan as well whenever you free the highest slot.
>>>     
>> Right.
>>
>>   
>>> Before you implement something more complicated, I'd like to
>>> see evidence that you are burning cycles in this code.
>>>     
>> I don't see a big difference in complexity between maintaining
>> the per-slot state in a bit per slot structure which wastes memory
>> and dcache vs. keeping it in an efficient bitmap.  The algorithm is
>> essentially the same...
>>   
> I like the fact that you are thinking about this, as this code is 
> critical for providing efficient I/O.  We had huge bursty I/O problems 
> with the old scheme in 2.6.18, so doing it right is important.
> 
> Hmmm, there seems to be multiple items to track.
> 1. Keeping track of the lowest slotid (i.e., the next available slot)
> 2. Keeping track of the highest slotid (which is an indication of # 
> slots used if #1 is followed)
> 3. Actually retrieving the required nfs4_slot slot.
> 
> #1 and #2 could theoretically use a bitmap, but we want many more slots 
> than 64.  Long fat links require lots of outstanding requests.  We need 
> to think about how we could manage 64, 128, 200+ slots. 
> 
> I like Brent's plan. 
> a) Keep an hashtable/array of the slots.
> b) Keep track of the lowest and highest slot ids in 2 variables
> c) Each time you alloc a slot scan up to the next free slot.  If the 
> (next free slot) > (highest used id), then (hightest used id) = (next 
> free slot)

well, just s/next free/allocated/ in the above...

> d) Each time to dealloc a slot, if it's id > (lowest free id) and id < 
> (highest used id) then do nothing.  If its id < (lowest free id) then 
> (lowest free id) = id.  If its id == (highest used id), then you need to 
> scan down to find the next highest used id.

right, (that's what the freelist implementation below is doing BTW)

Benny

> 
> Dean
>> Benny
>>
>>   
>>>>>> Benny Halevy said:
>>>>>>           
>>>  > 
>>>  > On Mar. 16, 2008, 21:50 +0200, Dean Hildebrand <seattleplus at gmail.com> wrote
>>>      :
>>>  > > 
>>>  > > Benny Halevy wrote:
>>>  > >> On Mar. 15, 2008, 0:08 +0200, Dean Hildebrand <seattleplus at gmail.com> wro
>>>      te:
>>>  > >>   
>>>  > >>> Hi Benny,
>>>  > >>>
>>>  > >>> Could you add some comments to the code to explain your new algorithm 
>>>  > >>> and logic?  Especially explaining the purpose of each loop and why we 
>>>  > >>> need to track the highest slot id.
>>>  > >>>     
>>>  > >> Sure.  no problem.
>>>  > >> By the way, I thought about this more over the weekend, and managing the
>>>  > >> free slots can be made even easier with a bitmap.  Up to 64 slots can be
>>>  > >> covered trivially and efficiently with a single unsigned long integer and
>>>  > >> then the only remaining data we need to keep per-slot (for now) is its se
>>>      qid.
>>>  > >>   
>>>  > > Would that still require searching through the 64 slots (bits)?  We 
>>>  > > should be able to do it in O(1) if we use a list and keep track of the 
>>>  > > pointer to the slot for each request, non?
>>>  > 
>>>  > I'm not sure what use case you have in mind.
>>>  > The one I'm thinking of is keeping the highest slot id in use.
>>>  > when a slot becomes free you need to track which used now
>>>  > has the highest slot ID.  This is done to provide feedback to the server
>>>  > so it can adjust and possibly rebalance the resources allocated for each cli
>>>      ent.
>>>  > If you want to do that in O(1) with a list keeping all used slots, the list
>>>  > needs to be ordered, which requires O(n) for each insert :-(
>>>  > 
>>>  > With the proposed bitmap, scanning can cost O(n) but the actual complexity c
>>>      onstant
>>>  > is very low comparing to scanning a list when you take into account memory
>>>  > access penalties and caching.
>>>  > 
>>>  > Note 1: Some architectures (x86 in particular :) provide for cpu instruction
>>>      s
>>>  > that find the first or last bit set in a word in O(1)
>>>  > 
>>>  > Note 2: This does not have to be done in real time and can be updated period
>>>      ically.
>>>  > 
>>>  > Benny
>>>  > 
>>>  > > 
>>>  > > Dean
>>>  > >>   
>>>  > >>> Dean
>>>  > >>>
>>>  > >>> Benny Halevy wrote:
>>>  > >>>     
>>>  > >>>> Also,
>>>  > >>>> * Maintain highest_slotid correctly.
>>>  > >>>> * sleep for a free slot only if failed to allocate one.
>>>  > >>>>
>>>  > >>>> Signed-off-by: Benny Halevy <bhalevy at panasas.com>
>>>  > >>>> ---
>>>  > >>>>  fs/nfs/nfs4proc.c            |  104 +++++++++++++++++++++++++++-------
>>>      --------
>>>  > >>>>  include/linux/nfs4_session.h |    9 +--
>>>  > >>>>  2 files changed, 70 insertions(+), 43 deletions(-)
>>>  > >>>>
>>>  > >>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>  > >>>> index 0203afe..d0e9c5a 100644
>>>  > >>>> --- a/fs/nfs/nfs4proc.c
>>>  > >>>> +++ b/fs/nfs/nfs4proc.c
>>>  > >>>> @@ -210,6 +210,43 @@ static void renew_lease(const struct nfs_server *s
>>>      erver, unsigned long timestamp
>>>  > >>>>  }
>>>  > >>>>  
>>>  > >>>>  #if defined(CONFIG_NFS_V4_1)
>>>  > >>>> +
>>>  > >>>> +static inline void __free_slot_sorted(struct nfs4_slot_table *tbl,
>>>  > >>>> +				      struct nfs4_slot *slot)
>>>  > >>>> +{
>>>  > >>>> +	int slot_nr = slot->slot_nr;
>>>  > >>>> +	struct nfs4_slot *sp;
>>>  > >>>> +
>>>  > >>>> +	list_for_each_entry (sp, &tbl->free_slots, entry)
>>>  > >>>> +		if (slot_nr < sp->slot_nr) {
>>>  > >>>> +			list_add_tail(&slot->entry, &sp->entry);
>>>  > >>>> +			return;
>>>  > >>>> +		}
>>>  > >>>> +	list_add_tail(&slot->entry, &tbl->free_slots);
>>>  > >>>> +}
>>>  > >>>> +
>>>  > >>>> +static inline void __nfs41_free_slot(struct nfs4_slot_table *tbl,
>>>  > >>>> +				     struct nfs4_slot *slot)
>>>  > >>>> +{
>>>  > >>>> +	int slot_nr;
>>>  > >>>> +	struct nfs4_slot *sp;
>>>  > >>>> +
>>>  > >>>> +	spin_lock(&tbl->slot_tbl_lock);
>>>  > >>>> +	__free_slot_sorted(tbl, slot);
>>>  > >>>> +	slot_nr = slot->slot_nr;
>>>  > >>>> +	if (slot_nr == tbl->highest_slotid) {
>>>  > >>>> +		slot_nr--;
>>>  > >>>> +		list_for_each_entry_reverse(sp, &tbl->free_slots, entry
>>>      ) {
>>>  > >>>> +			if (sp->slot_nr != slot_nr)
>>>  > >>>> +				break;
>>>  > >>>> +			slot_nr--;
>>>  > >>>> +		}
>>>  > >>>> +		tbl->highest_slotid = slot_nr;
>>>  > >>>> +	}
>>>  > >>>> +	spin_unlock(&tbl->slot_tbl_lock);
>>>  > >>>> +	rpc_wake_up_next(&tbl->slot_tbl_waitq);
>>>  > >>>> +}
>>>  > >>>> +
>>>  > >>>>  /* For pNFS filelayout data servers:
>>>  > >>>>   * the nfs_client is NULL - to signal no lease update.
>>>  > >>>>   * session is the data server.
>>>  > >>>> @@ -235,7 +272,7 @@ static int nfs41_sequence_done(struct nfs_client *c
>>>      lp,
>>>  > >>>>  	slot = res->sr_slot;
>>>  > >>>>  	if (slot == NULL) {
>>>  > >>>>  		dprintk("%s: no slot: status %d\n", __func__, status);
>>>  > >>>> -		goto out;	/* session recovery probably failed */
>>>  > >>>> +		goto ret;	/* session recovery probably failed */
>>>  > >>>>  	}
>>>  > >>>>  
>>>  > >>>>  	switch (status) {
>>>  > >>>> @@ -267,12 +304,7 @@ static int nfs41_sequence_done(struct nfs_client *
>>>      clp,
>>>  > >>>>  		spin_unlock(&clp->cl_lock);
>>>  > >>>>  	}
>>>  > >>>>  no_update:
>>>  > >>>> -	/* Clear the 'busy' bit on the slot that was used */
>>>  > >>>> -	smp_mb__before_clear_bit();
>>>  > >>>> -	clear_bit(NFS4_SLOT_BUSY, &slot->flags);
>>>  > >>>> -	smp_mb__after_clear_bit();
>>>  > >>>> -out:
>>>  > >>>> -	rpc_wake_up_next(&tbl->slot_tbl_waitq);
>>>  > >>>> +	__nfs41_free_slot(tbl, slot);
>>>  > >>>>  ret:
>>>  > >>>>  	return status;
>>>  > >>>>  }
>>>  > >>>> @@ -299,31 +331,28 @@ static int nfs4_sequence_done(struct nfs_server *
>>>      server,
>>>  > >>>>  	return ret;
>>>  > >>>>  }
>>>  > >>>>  
>>>  > >>>> -static struct nfs4_slot *__nfs4_find_slot(struct nfs4_slot_table *tbl)
>>>  > >>>> +static inline struct nfs4_slot *nfs4_find_slot(struct nfs4_slot_table 
>>>      *tbl,
>>>  > >>>> +					       struct rpc_task *task,
>>>  > >>>> +					       struct nfs41_sequence_ar
>>>      gs *args)
>>>  > >>>>  {
>>>  > >>>> -	int i;
>>>  > >>>> +	int slot_nr;
>>>  > >>>>  	struct nfs4_slot *slot;
>>>  > >>>>  
>>>  > >>>> -	for (i = 0; i < tbl->max_slots; ++i) {
>>>  > >>>> -		slot = &tbl->slots[i];
>>>  > >>>> -		if (!test_bit(NFS4_SLOT_BUSY, &slot->flags)) {
>>>  > >>>> -			set_bit(NFS4_SLOT_BUSY, &slot->flags);
>>>  > >>>> -			return slot;
>>>  > >>>> -		}
>>>  > >>>> -	}
>>>  > >>>> -
>>>  > >>>> -	return NULL;
>>>  > >>>> -}
>>>  > >>>> -
>>>  > >>>> -static struct nfs4_slot *nfs4_find_slot(struct nfs4_slot_table *tbl,
>>>  > >>>> -					struct rpc_task *task)
>>>  > >>>> -{
>>>  > >>>> -	struct nfs4_slot *slot;
>>>  > >>>> -
>>>  > >>>> -	rpc_sleep_on(&tbl->slot_tbl_waitq, task, NULL, NULL);
>>>  > >>>> -
>>>  > >>>>  	spin_lock(&tbl->slot_tbl_lock);
>>>  > >>>> -	slot = __nfs4_find_slot(tbl);
>>>  > >>>> +	if (likely(!list_empty(&tbl->free_slots))) {
>>>  > >>>> +		slot = list_first_entry(&tbl->free_slots, struct nfs4_s
>>>      lot,
>>>  > >>>> +					entry);
>>>  > >>>> +		list_del(&slot->entry);
>>>  > >>>> +		slot_nr = slot->slot_nr;
>>>  > >>>> +		if (tbl->highest_slotid < slot_nr)
>>>  > >>>> +			tbl->highest_slotid = slot_nr;
>>>  > >>>> +		args->sa_slotid = slot_nr;
>>>  > >>>> +		args->sa_seqid = slot->seq_nr;
>>>  > >>>> +		args->sa_max_slotid = tbl->highest_slotid;
>>>  > >>>> +	} else {
>>>  > >>>> +		slot = NULL;
>>>  > >>>> +		rpc_sleep_on(&tbl->slot_tbl_waitq, task, NULL, NULL);
>>>  > >>>> +	}
>>>  > >>>>  	spin_unlock(&tbl->slot_tbl_lock);
>>>  > >>>>  
>>>  > >>>>  	return slot;
>>>  > >>>> @@ -338,21 +367,21 @@ static int nfs41_setup_sequence(struct nfs4_sessi
>>>      on *session,
>>>  > >>>>  	struct nfs4_slot *slot;
>>>  > >>>>  	struct nfs4_slot_table *tbl;
>>>  > >>>>  
>>>  > >>>> +	dprintk("--> %s\n", __func__);
>>>  > >>>>  	tbl = &session->fore_channel.slot_table;
>>>  > >>>> -	slot = nfs4_find_slot(tbl, task);
>>>  > >>>> +	slot = nfs4_find_slot(tbl, task, args);
>>>  > >>>>  
>>>  > >>>> -	if (!slot)
>>>  > >>>> +	if (!slot) {
>>>  > >>>> +		dprintk("<-- %s: no free slots\n", __func__);
>>>  > >>>>  		return -EAGAIN;
>>>  > >>>> +	}
>>>  > >>>>  
>>>  > >>>>  	memcpy(args->sa_sessionid.data, session->sess_id,
>>>  > >>>>  	       NFS4_MAX_SESSIONID_LEN);
>>>  > >>>> -	args->sa_slotid = slot->slot_nr;
>>>  > >>>> -	args->sa_seqid = slot->seq_nr;
>>>  > >>>>  	args->sa_cache_this = cache_reply;
>>>  > >>>>  
>>>  > >>>> -	spin_lock(&tbl->slot_tbl_lock);
>>>  > >>>> -	args->sa_max_slotid = tbl->max_slots;
>>>  > >>>> -	spin_unlock(&tbl->slot_tbl_lock);
>>>  > >>>> +	dprintk("<-- %s slot=%p slotid=%d seqid=%d\n", __func__,
>>>  > >>>> +		slot, args->sa_slotid, slot->seq_nr);
>>>  > >>>>  
>>>  > >>>>  	res->sr_slot = slot;
>>>  > >>>>  	res->sr_renewal_time = jiffies;
>>>  > >>>> @@ -4541,11 +4570,11 @@ int nfs4_init_slot_table(struct nfs4_channel *c
>>>      hannel)
>>>  > >>>>  	dprintk("%s: slots=%p max_slots=%d\n", __func__, 
>>>  > >>>>  		tbl->slots, tbl->max_slots);
>>>  > >>>>  
>>>  > >>>> -	for (i = 0; i < channel->chan_attrs.max_reqs; ++i) {
>>>  > >>>> +	for (i = 0; i < channel->chan_attrs.max_reqs - 1; ++i) {
>>>  > >>>>  		slot = &tbl->slots[i];
>>>  > >>>>  		slot->slot_nr = i;
>>>  > >>>>  		slot->seq_nr = 1;
>>>  > >>>> -		slot->flags = 0;
>>>  > >>>> +		list_add_tail(&slot->entry, &tbl->free_slots);
>>>  > >>>>  	}
>>>  > >>>>  
>>>  > >>>>  out:
>>>  > >>>> @@ -4573,6 +4602,7 @@ static int nfs4_init_channel(struct nfs4_channel 
>>>      *channel)
>>>  > >>>>  	tbl = &channel->slot_table;
>>>  > >>>>  
>>>  > >>>>  	spin_lock_init(&tbl->slot_tbl_lock);
>>>  > >>>> +	INIT_LIST_HEAD(&tbl->free_slots);
>>>  > >>>>  	rpc_init_wait_queue(&tbl->slot_tbl_waitq, "Slot table");
>>>  > >>>>  
>>>  > >>>>  	return 0;
>>>  > >>>> diff --git a/include/linux/nfs4_session.h b/include/linux/nfs4_session.
>>>      h
>>>  > >>>> index 8ef8d6d..542d099 100644
>>>  > >>>> --- a/include/linux/nfs4_session.h
>>>  > >>>> +++ b/include/linux/nfs4_session.h
>>>  > >>>> @@ -7,11 +7,6 @@
>>>  > >>>>  #include <linux/smp_lock.h>
>>>  > >>>>  #include <linux/sunrpc/sched.h>
>>>  > >>>>  
>>>  > >>>> -/* The flags for the nfs4_slot struct */
>>>  > >>>> -#define NFS4_SLOT_BUSY		0X0	/* Slot in use */
>>>  > >>>> -#define NFS4_SLOT_RECLAIMED	0x1	/* Slot has been reclaimed by
>>>  > >>>> -					   the server */
>>>  > >>>> -
>>>  > >>>>  struct nfs4_channel_attrs {
>>>  > >>>>  	u32			headerpadsz;
>>>  > >>>>  	u32			max_rqst_sz;
>>>  > >>>> @@ -25,14 +20,16 @@ struct nfs4_channel_attrs {
>>>  > >>>>  struct nfs4_slot {
>>>  > >>>>  	u32		 	slot_nr;
>>>  > >>>>  	u32		 	seq_nr;
>>>  > >>>> -	unsigned long		flags;
>>>  > >>>> +	struct list_head 	entry;
>>>  > >>>>  };
>>>  > >>>>  
>>>  > >>>>  struct nfs4_slot_table {
>>>  > >>>>  	struct nfs4_slot 	*slots;
>>>  > >>>>  	spinlock_t		slot_tbl_lock;
>>>  > >>>> +	struct list_head 	free_slots;
>>>  > >>>>  	struct rpc_wait_queue	slot_tbl_waitq;
>>>  > >>>>  	int			max_slots;
>>>  > >>>> +	int			highest_slotid;
>>>  > >>>>  };
>>>  > >>>>  
>>>  > >>>>  struct nfs4_channel {
>>>  > >>>>   
>>>  > >>>>       
>>>  > >>   
>>>  > 
>>>  > _______________________________________________
>>>  > pNFS mailing list
>>>  > pNFS at linux-nfs.org
>>>  > http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>>>  > 
>>>  > 
>>>
>>> --
>>> Brent Welch
>>> Director, Software Architecture, Panasas Inc
>>> The Leader in Parallel Storage
>>>
>>> www.panasas.com
>>> welch at panasas.com
>>>     
>>   



More information about the pNFS mailing list