[pnfs] [PATCH 4/4] nfs41: use a sorted free list for sessions slots
Benny Halevy
bhalevy at panasas.com
Mon Mar 17 13:55:56 EDT 2008
On Mar. 17, 2008, 19:39 +0200, Dean Hildebrand <seattleplus at gmail.com> wrote:
>
> Benny Halevy wrote:
>> On Mar. 17, 2008, 1:27 +0200, Brent Welch <welch at panasas.com> wrote:
>>
>>> Hmm - instead of lists or bitmaps, I would suggest just keeping
>>> track of the lowest free slot, and highest used slot. (I don't
>>> think you need the highest free slot, but you might.) These are
>>> cheap to maintain, and they give what you need to convey to the
>>> server, right? Each time you allocate the lowest free slot you
>>> scan up to the next free slot, which is O(n). Freeing a slot you
>>> maintain this with O(1). Maintaining the highest used slot
>>> requires a scan as well whenever you free the highest slot.
>>>
>> Right.
>>
>>
>>> Before you implement something more complicated, I'd like to
>>> see evidence that you are burning cycles in this code.
>>>
>> I don't see a big difference in complexity between maintaining
>> the per-slot state in a bit per slot structure which wastes memory
>> and dcache vs. keeping it in an efficient bitmap. The algorithm is
>> essentially the same...
>>
> I like the fact that you are thinking about this, as this code is
> critical for providing efficient I/O. We had huge bursty I/O problems
> with the old scheme in 2.6.18, so doing it right is important.
>
> Hmmm, there seems to be multiple items to track.
> 1. Keeping track of the lowest slotid (i.e., the next available slot)
> 2. Keeping track of the highest slotid (which is an indication of #
> slots used if #1 is followed)
> 3. Actually retrieving the required nfs4_slot slot.
>
> #1 and #2 could theoretically use a bitmap, but we want many more slots
> than 64. Long fat links require lots of outstanding requests. We need
> to think about how we could manage 64, 128, 200+ slots.
>
> I like Brent's plan.
> a) Keep an hashtable/array of the slots.
> b) Keep track of the lowest and highest slot ids in 2 variables
> c) Each time you alloc a slot scan up to the next free slot. If the
> (next free slot) > (highest used id), then (hightest used id) = (next
> free slot)
well, just s/next free/allocated/ in the above...
> d) Each time to dealloc a slot, if it's id > (lowest free id) and id <
> (highest used id) then do nothing. If its id < (lowest free id) then
> (lowest free id) = id. If its id == (highest used id), then you need to
> scan down to find the next highest used id.
right, (that's what the freelist implementation below is doing BTW)
Benny
>
> Dean
>> Benny
>>
>>
>>>>>> Benny Halevy said:
>>>>>>
>>> >
>>> > On Mar. 16, 2008, 21:50 +0200, Dean Hildebrand <seattleplus at gmail.com> wrote
>>> :
>>> > >
>>> > > Benny Halevy wrote:
>>> > >> On Mar. 15, 2008, 0:08 +0200, Dean Hildebrand <seattleplus at gmail.com> wro
>>> te:
>>> > >>
>>> > >>> Hi Benny,
>>> > >>>
>>> > >>> Could you add some comments to the code to explain your new algorithm
>>> > >>> and logic? Especially explaining the purpose of each loop and why we
>>> > >>> need to track the highest slot id.
>>> > >>>
>>> > >> Sure. no problem.
>>> > >> By the way, I thought about this more over the weekend, and managing the
>>> > >> free slots can be made even easier with a bitmap. Up to 64 slots can be
>>> > >> covered trivially and efficiently with a single unsigned long integer and
>>> > >> then the only remaining data we need to keep per-slot (for now) is its se
>>> qid.
>>> > >>
>>> > > Would that still require searching through the 64 slots (bits)? We
>>> > > should be able to do it in O(1) if we use a list and keep track of the
>>> > > pointer to the slot for each request, non?
>>> >
>>> > I'm not sure what use case you have in mind.
>>> > The one I'm thinking of is keeping the highest slot id in use.
>>> > when a slot becomes free you need to track which used now
>>> > has the highest slot ID. This is done to provide feedback to the server
>>> > so it can adjust and possibly rebalance the resources allocated for each cli
>>> ent.
>>> > If you want to do that in O(1) with a list keeping all used slots, the list
>>> > needs to be ordered, which requires O(n) for each insert :-(
>>> >
>>> > With the proposed bitmap, scanning can cost O(n) but the actual complexity c
>>> onstant
>>> > is very low comparing to scanning a list when you take into account memory
>>> > access penalties and caching.
>>> >
>>> > Note 1: Some architectures (x86 in particular :) provide for cpu instruction
>>> s
>>> > that find the first or last bit set in a word in O(1)
>>> >
>>> > Note 2: This does not have to be done in real time and can be updated period
>>> ically.
>>> >
>>> > Benny
>>> >
>>> > >
>>> > > Dean
>>> > >>
>>> > >>> Dean
>>> > >>>
>>> > >>> Benny Halevy wrote:
>>> > >>>
>>> > >>>> Also,
>>> > >>>> * Maintain highest_slotid correctly.
>>> > >>>> * sleep for a free slot only if failed to allocate one.
>>> > >>>>
>>> > >>>> Signed-off-by: Benny Halevy <bhalevy at panasas.com>
>>> > >>>> ---
>>> > >>>> fs/nfs/nfs4proc.c | 104 +++++++++++++++++++++++++++-------
>>> --------
>>> > >>>> include/linux/nfs4_session.h | 9 +--
>>> > >>>> 2 files changed, 70 insertions(+), 43 deletions(-)
>>> > >>>>
>>> > >>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> > >>>> index 0203afe..d0e9c5a 100644
>>> > >>>> --- a/fs/nfs/nfs4proc.c
>>> > >>>> +++ b/fs/nfs/nfs4proc.c
>>> > >>>> @@ -210,6 +210,43 @@ static void renew_lease(const struct nfs_server *s
>>> erver, unsigned long timestamp
>>> > >>>> }
>>> > >>>>
>>> > >>>> #if defined(CONFIG_NFS_V4_1)
>>> > >>>> +
>>> > >>>> +static inline void __free_slot_sorted(struct nfs4_slot_table *tbl,
>>> > >>>> + struct nfs4_slot *slot)
>>> > >>>> +{
>>> > >>>> + int slot_nr = slot->slot_nr;
>>> > >>>> + struct nfs4_slot *sp;
>>> > >>>> +
>>> > >>>> + list_for_each_entry (sp, &tbl->free_slots, entry)
>>> > >>>> + if (slot_nr < sp->slot_nr) {
>>> > >>>> + list_add_tail(&slot->entry, &sp->entry);
>>> > >>>> + return;
>>> > >>>> + }
>>> > >>>> + list_add_tail(&slot->entry, &tbl->free_slots);
>>> > >>>> +}
>>> > >>>> +
>>> > >>>> +static inline void __nfs41_free_slot(struct nfs4_slot_table *tbl,
>>> > >>>> + struct nfs4_slot *slot)
>>> > >>>> +{
>>> > >>>> + int slot_nr;
>>> > >>>> + struct nfs4_slot *sp;
>>> > >>>> +
>>> > >>>> + spin_lock(&tbl->slot_tbl_lock);
>>> > >>>> + __free_slot_sorted(tbl, slot);
>>> > >>>> + slot_nr = slot->slot_nr;
>>> > >>>> + if (slot_nr == tbl->highest_slotid) {
>>> > >>>> + slot_nr--;
>>> > >>>> + list_for_each_entry_reverse(sp, &tbl->free_slots, entry
>>> ) {
>>> > >>>> + if (sp->slot_nr != slot_nr)
>>> > >>>> + break;
>>> > >>>> + slot_nr--;
>>> > >>>> + }
>>> > >>>> + tbl->highest_slotid = slot_nr;
>>> > >>>> + }
>>> > >>>> + spin_unlock(&tbl->slot_tbl_lock);
>>> > >>>> + rpc_wake_up_next(&tbl->slot_tbl_waitq);
>>> > >>>> +}
>>> > >>>> +
>>> > >>>> /* For pNFS filelayout data servers:
>>> > >>>> * the nfs_client is NULL - to signal no lease update.
>>> > >>>> * session is the data server.
>>> > >>>> @@ -235,7 +272,7 @@ static int nfs41_sequence_done(struct nfs_client *c
>>> lp,
>>> > >>>> slot = res->sr_slot;
>>> > >>>> if (slot == NULL) {
>>> > >>>> dprintk("%s: no slot: status %d\n", __func__, status);
>>> > >>>> - goto out; /* session recovery probably failed */
>>> > >>>> + goto ret; /* session recovery probably failed */
>>> > >>>> }
>>> > >>>>
>>> > >>>> switch (status) {
>>> > >>>> @@ -267,12 +304,7 @@ static int nfs41_sequence_done(struct nfs_client *
>>> clp,
>>> > >>>> spin_unlock(&clp->cl_lock);
>>> > >>>> }
>>> > >>>> no_update:
>>> > >>>> - /* Clear the 'busy' bit on the slot that was used */
>>> > >>>> - smp_mb__before_clear_bit();
>>> > >>>> - clear_bit(NFS4_SLOT_BUSY, &slot->flags);
>>> > >>>> - smp_mb__after_clear_bit();
>>> > >>>> -out:
>>> > >>>> - rpc_wake_up_next(&tbl->slot_tbl_waitq);
>>> > >>>> + __nfs41_free_slot(tbl, slot);
>>> > >>>> ret:
>>> > >>>> return status;
>>> > >>>> }
>>> > >>>> @@ -299,31 +331,28 @@ static int nfs4_sequence_done(struct nfs_server *
>>> server,
>>> > >>>> return ret;
>>> > >>>> }
>>> > >>>>
>>> > >>>> -static struct nfs4_slot *__nfs4_find_slot(struct nfs4_slot_table *tbl)
>>> > >>>> +static inline struct nfs4_slot *nfs4_find_slot(struct nfs4_slot_table
>>> *tbl,
>>> > >>>> + struct rpc_task *task,
>>> > >>>> + struct nfs41_sequence_ar
>>> gs *args)
>>> > >>>> {
>>> > >>>> - int i;
>>> > >>>> + int slot_nr;
>>> > >>>> struct nfs4_slot *slot;
>>> > >>>>
>>> > >>>> - for (i = 0; i < tbl->max_slots; ++i) {
>>> > >>>> - slot = &tbl->slots[i];
>>> > >>>> - if (!test_bit(NFS4_SLOT_BUSY, &slot->flags)) {
>>> > >>>> - set_bit(NFS4_SLOT_BUSY, &slot->flags);
>>> > >>>> - return slot;
>>> > >>>> - }
>>> > >>>> - }
>>> > >>>> -
>>> > >>>> - return NULL;
>>> > >>>> -}
>>> > >>>> -
>>> > >>>> -static struct nfs4_slot *nfs4_find_slot(struct nfs4_slot_table *tbl,
>>> > >>>> - struct rpc_task *task)
>>> > >>>> -{
>>> > >>>> - struct nfs4_slot *slot;
>>> > >>>> -
>>> > >>>> - rpc_sleep_on(&tbl->slot_tbl_waitq, task, NULL, NULL);
>>> > >>>> -
>>> > >>>> spin_lock(&tbl->slot_tbl_lock);
>>> > >>>> - slot = __nfs4_find_slot(tbl);
>>> > >>>> + if (likely(!list_empty(&tbl->free_slots))) {
>>> > >>>> + slot = list_first_entry(&tbl->free_slots, struct nfs4_s
>>> lot,
>>> > >>>> + entry);
>>> > >>>> + list_del(&slot->entry);
>>> > >>>> + slot_nr = slot->slot_nr;
>>> > >>>> + if (tbl->highest_slotid < slot_nr)
>>> > >>>> + tbl->highest_slotid = slot_nr;
>>> > >>>> + args->sa_slotid = slot_nr;
>>> > >>>> + args->sa_seqid = slot->seq_nr;
>>> > >>>> + args->sa_max_slotid = tbl->highest_slotid;
>>> > >>>> + } else {
>>> > >>>> + slot = NULL;
>>> > >>>> + rpc_sleep_on(&tbl->slot_tbl_waitq, task, NULL, NULL);
>>> > >>>> + }
>>> > >>>> spin_unlock(&tbl->slot_tbl_lock);
>>> > >>>>
>>> > >>>> return slot;
>>> > >>>> @@ -338,21 +367,21 @@ static int nfs41_setup_sequence(struct nfs4_sessi
>>> on *session,
>>> > >>>> struct nfs4_slot *slot;
>>> > >>>> struct nfs4_slot_table *tbl;
>>> > >>>>
>>> > >>>> + dprintk("--> %s\n", __func__);
>>> > >>>> tbl = &session->fore_channel.slot_table;
>>> > >>>> - slot = nfs4_find_slot(tbl, task);
>>> > >>>> + slot = nfs4_find_slot(tbl, task, args);
>>> > >>>>
>>> > >>>> - if (!slot)
>>> > >>>> + if (!slot) {
>>> > >>>> + dprintk("<-- %s: no free slots\n", __func__);
>>> > >>>> return -EAGAIN;
>>> > >>>> + }
>>> > >>>>
>>> > >>>> memcpy(args->sa_sessionid.data, session->sess_id,
>>> > >>>> NFS4_MAX_SESSIONID_LEN);
>>> > >>>> - args->sa_slotid = slot->slot_nr;
>>> > >>>> - args->sa_seqid = slot->seq_nr;
>>> > >>>> args->sa_cache_this = cache_reply;
>>> > >>>>
>>> > >>>> - spin_lock(&tbl->slot_tbl_lock);
>>> > >>>> - args->sa_max_slotid = tbl->max_slots;
>>> > >>>> - spin_unlock(&tbl->slot_tbl_lock);
>>> > >>>> + dprintk("<-- %s slot=%p slotid=%d seqid=%d\n", __func__,
>>> > >>>> + slot, args->sa_slotid, slot->seq_nr);
>>> > >>>>
>>> > >>>> res->sr_slot = slot;
>>> > >>>> res->sr_renewal_time = jiffies;
>>> > >>>> @@ -4541,11 +4570,11 @@ int nfs4_init_slot_table(struct nfs4_channel *c
>>> hannel)
>>> > >>>> dprintk("%s: slots=%p max_slots=%d\n", __func__,
>>> > >>>> tbl->slots, tbl->max_slots);
>>> > >>>>
>>> > >>>> - for (i = 0; i < channel->chan_attrs.max_reqs; ++i) {
>>> > >>>> + for (i = 0; i < channel->chan_attrs.max_reqs - 1; ++i) {
>>> > >>>> slot = &tbl->slots[i];
>>> > >>>> slot->slot_nr = i;
>>> > >>>> slot->seq_nr = 1;
>>> > >>>> - slot->flags = 0;
>>> > >>>> + list_add_tail(&slot->entry, &tbl->free_slots);
>>> > >>>> }
>>> > >>>>
>>> > >>>> out:
>>> > >>>> @@ -4573,6 +4602,7 @@ static int nfs4_init_channel(struct nfs4_channel
>>> *channel)
>>> > >>>> tbl = &channel->slot_table;
>>> > >>>>
>>> > >>>> spin_lock_init(&tbl->slot_tbl_lock);
>>> > >>>> + INIT_LIST_HEAD(&tbl->free_slots);
>>> > >>>> rpc_init_wait_queue(&tbl->slot_tbl_waitq, "Slot table");
>>> > >>>>
>>> > >>>> return 0;
>>> > >>>> diff --git a/include/linux/nfs4_session.h b/include/linux/nfs4_session.
>>> h
>>> > >>>> index 8ef8d6d..542d099 100644
>>> > >>>> --- a/include/linux/nfs4_session.h
>>> > >>>> +++ b/include/linux/nfs4_session.h
>>> > >>>> @@ -7,11 +7,6 @@
>>> > >>>> #include <linux/smp_lock.h>
>>> > >>>> #include <linux/sunrpc/sched.h>
>>> > >>>>
>>> > >>>> -/* The flags for the nfs4_slot struct */
>>> > >>>> -#define NFS4_SLOT_BUSY 0X0 /* Slot in use */
>>> > >>>> -#define NFS4_SLOT_RECLAIMED 0x1 /* Slot has been reclaimed by
>>> > >>>> - the server */
>>> > >>>> -
>>> > >>>> struct nfs4_channel_attrs {
>>> > >>>> u32 headerpadsz;
>>> > >>>> u32 max_rqst_sz;
>>> > >>>> @@ -25,14 +20,16 @@ struct nfs4_channel_attrs {
>>> > >>>> struct nfs4_slot {
>>> > >>>> u32 slot_nr;
>>> > >>>> u32 seq_nr;
>>> > >>>> - unsigned long flags;
>>> > >>>> + struct list_head entry;
>>> > >>>> };
>>> > >>>>
>>> > >>>> struct nfs4_slot_table {
>>> > >>>> struct nfs4_slot *slots;
>>> > >>>> spinlock_t slot_tbl_lock;
>>> > >>>> + struct list_head free_slots;
>>> > >>>> struct rpc_wait_queue slot_tbl_waitq;
>>> > >>>> int max_slots;
>>> > >>>> + int highest_slotid;
>>> > >>>> };
>>> > >>>>
>>> > >>>> struct nfs4_channel {
>>> > >>>>
>>> > >>>>
>>> > >>
>>> >
>>> > _______________________________________________
>>> > pNFS mailing list
>>> > pNFS at linux-nfs.org
>>> > http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>>> >
>>> >
>>>
>>> --
>>> Brent Welch
>>> Director, Software Architecture, Panasas Inc
>>> The Leader in Parallel Storage
>>>
>>> www.panasas.com
>>> welch at panasas.com
>>>
>>
More information about the pNFS
mailing list