[pnfs] [PATCH 1/2] Slot Table Implementation

Benny Halevy bhalevy at panasas.com
Mon Mar 19 18:32:33 EDT 2007


Iyer, Rahul wrote:

>Yes, this seems to be a good optimization. I can work on the
>optimizations once I have the callback path implemented. I'd rather have
>a  full featured prototype than an efficient but incomplete one. Does
>that sound fine? I've added an issue to the pNFS wiki.
>  
>
Sounds good.
Thank you,
Benny

>Thanks
>Regards
>Rahul
> 
>
>  
>
>>-----Original Message-----
>>From: Benny Halevy [mailto:bhalevy at panasas.com] 
>>Sent: Monday, March 19, 2007 10:48 AM
>>To: Iyer, Rahul
>>Cc: pnfs at linux-nfs.org
>>Subject: Re: [pnfs] [PATCH 1/2] Slot Table Implementation
>>
>>One more optimization that Brent suggests is keeping a single 
>>"lowest free slot" index per slot table which would help 
>>reduce the table scan cost.
>>
>>When you allocate that slot mark the index as "used".
>>When you free a slot that has a lower id, or if the index is 
>>marked as used, just update that index.
>>When you want to allocate a slot and the index is marked as 
>>used, just scan from there upward until you find the next 
>>free one and update the index as "used".
>>
>>Benny
>>
>>Benny Halevy wrote:
>>
>>    
>>
>>>Hmm, keeping the free list in order has the same O(n) complexity as 
>>>scanning for the first free slot.  We just need a fast way 
>>>      
>>>
>>to tell if 
>>    
>>
>>>there are any free slots (a list, or a counter) for the queue sleep 
>>>condition.  A single spin_lock for scanning the slot table 
>>>      
>>>
>>(or for the 
>>    
>>
>>>list) should more efficient than a test_and_set bit per slot (when 
>>>there's enough load) and so is a single queue rather than a 
>>>      
>>>
>>queue per 
>>    
>>
>>>slot.  The latter is crucial for getting to the first slot 
>>>      
>>>
>>that becomes 
>>    
>>
>>>available and not waiting on average half the response time in the 
>>>system.
>>>
>>>Benny
>>>
>>>Iyer, Rahul wrote:
>>>
>>> 
>>>
>>>      
>>>
>>>>Hi Benny,
>>>>My first idea was similar to this. That was the reason I had the 
>>>>unused slots and used slots lists. The reason I abandoned that was 
>>>>because it would be difficult to get the lowest numbered slot. The 
>>>>draft suggests that the client should use the lowest 
>>>>        
>>>>
>>numbered slot so 
>>    
>>
>>>>that the server can reclaim slots from the client if it 
>>>>        
>>>>
>>sees that they 
>>    
>>
>>>>are under-utilized (at least that's what I understand).It is not a 
>>>>hard requirement, but one that is recommended.
>>>>>From draft 10, Section 2.4.10.1, Paragraph 12
>>>>"The requester is required to use the lowest available slot when 
>>>>issuing a new request. This way, the replier may be able to retire 
>>>>slot entries faster."
>>>>
>>>>Further, this email exchange adds some more insight:
>>>>http://www1.ietf.org/mail-archive/web/nfsv4/current/msg03997.html
>>>>
>>>>The problem I foresaw with the lists is that keeping the 
>>>>        
>>>>
>>lists ordered 
>>    
>>
>>>>is almost as much work as having separate lists. As for the 
>>>>        
>>>>
>>atomic_t 
>>    
>>
>>>>variables, I'm not too hot about the atomic variables. 
>>>>        
>>>>
>>test_and_set is 
>>    
>>
>>>>a bus locking instruction and so isn't such a great idea... the 
>>>>spin_locks are "optimized for the fast path", which is getting the 
>>>>spin_locks, but they're too much of a big hammer maybe. 
>>>>        
>>>>
>>rw_locks might 
>>    
>>
>>>>be a better idea, but i'm not 100% convinced... I'm pretty 
>>>>        
>>>>
>>sure they 
>>    
>>
>>>>use some sort of bus locking instruction too. As for the current 
>>>>implementation, scaling shouldn't be too much of a problem as the 
>>>>number of slots is likely to be small (few tens) and the 
>>>>        
>>>>
>>worst case is 
>>    
>>
>>>>bounded. I agree it is linear time, but at least the worst case 
>>>>doesn't run away from us. Since the RPC slots is a tunable 
>>>>        
>>>>
>>parameter, 
>>    
>>
>>>>it would be interesting to see what an "optimal" rpc slot 
>>>>        
>>>>
>>count is...
>>    
>>
>>>>as in at what point does adding more slots make no difference to 
>>>>performance. Is there such a study?
>>>>
>>>>Another idea I can think of is to have a single lock that 
>>>>        
>>>>
>>protects the 
>>    
>>
>>>>whole slot table. We grab the lock and run down the slot table to 
>>>>select our slot (and set need_to_sleep if required), and 
>>>>        
>>>>
>>then release 
>>    
>>
>>>>the lock. Then, if need_to_sleep is set, we grab the lock, 
>>>>        
>>>>
>>increment 
>>    
>>
>>>>nr_waiters for the slot, release the lock and sleep.
>>>>Should this be a spin_lock or an rw_lock, I'm not sure... 
>>>>        
>>>>
>>spin_locks 
>>    
>>
>>>>will harm the concurrency because it would lock out 
>>>>        
>>>>
>>parallel reads. On 
>>    
>>
>>>>the other hand, rw_locks are biased towards readers or writers (not 
>>>>sure which for Linux), so that may not be a good thing.
>>>>
>>>>That being said, I'm not wedded to the current 
>>>>        
>>>>
>>implementation in any 
>>    
>>
>>>>way... comments and better algorithms are welcome. This was 
>>>>        
>>>>
>>just one 
>>    
>>
>>>>way that I thought made sense.
>>>>Thanks,
>>>>Regards
>>>>Rahul
>>>>
>>>>
>>>>-----Original Message-----
>>>>From: Benny Halevy [mailto:bhalevy at panasas.com]
>>>>Sent: Sat 3/17/2007 3:54 AM
>>>>To: Iyer, Rahul
>>>>Cc: pnfs at linux-nfs.org
>>>>Subject: Re: [pnfs] [PATCH 1/2] Slot Table Implementation
>>>>
>>>>Rahul, this is not urgent but I really think we should move to a 
>>>>single queue for several reason.  With your implementation when all 
>>>>slots are used you have to do max_slots test_and_set operations to
>>>>discover that.  This will just not scale.   I agree with your point
>>>>about semaphores and renegotiating the number of slots so how about 
>>>>this (sketch) instead:
>>>>
>>>>nfs41_proc_sequence_done()
>>>>{
>>>>       int need_wakeup = 0;
>>>>
>>>>       spin_lock(&res->channel->channel_lock);
>>>>       list_add(&res->slot->slot_list, 
>>>>        
>>>>
>>&res->channel->unused_slots);
>>    
>>
>>>>       spin_unlock(&res->channel->channel_lock);
>>>>
>>>>       /* Wake up the threads waiting on this session */
>>>>       rpc_wake_up(&res->channel->sl_waitq);
>>>>
>>>>       return status;
>>>>}
>>>>
>>>>struct nfs4_slot *nfs4_find_slot(struct nfs4_channel *channel) {
>>>>       struct nfs4_slot *slot;
>>>>      
>>>>       might_sleep();
>>>>
>>>>       for (;;) {
>>>>               int ret = 
>>>>        
>>>>
>>wait_event_interruptible(&channel->sl_waitq,
>>    
>>
>>>>                                                  
>>>>!list_empty(&channel->unused_slots));
>>>>               if (unlikely(ret) {
>>>>                       BUG_ON(ret != -ERESTARTSYS);
>>>>                       break;
>>>>               }
>>>>
>>>>               spin_lock(&channel->channel_lock);
>>>>               if (unlikely(list_empty(&channel->unused_slots))) {
>>>>                       spin_unlock(&channel->channel_lock);
>>>>                       continue;
>>>>               }
>>>>               slot = list_entry(channel->unused_slots.next,
>>>>                                 struct nfs4_slot,
>>>>                                 slot_list);
>>>>               list_del(&slot->slot_list);
>>>>               spin_unlock(&channel->channel_lock);
>>>>               return slot;
>>>>       }
>>>>
>>>>       return NULL;
>>>>}
>>>>
>>>>
>>>>Iyer, Rahul wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>Hi Benny,
>>>>>A comment inline...
>>>>>Regards
>>>>>Rahul
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Benny Halevy [mailto:bhalevy at panasas.com]
>>>>>>Sent: Friday, March 16, 2007 2:24 AM
>>>>>>To: Iyer, Rahul
>>>>>>Cc: pnfs at linux-nfs.org
>>>>>>Subject: Re: [pnfs] [PATCH 1/2] Slot Table Implementation
>>>>>>
>>>>>>Rahul, my comments below...
>>>>>>
>>>>>>Benny
>>>>>>
>>>>>>iyer at netapp.com wrote:
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>From: iyer <iyer at netapp.com>
>>>>>>>
>>>>>>>Added code to implement a slot table.
>>>>>>>
>>>>>>>Signed-off-by: iyer <iyer at netapp.com>
>>>>>>>---
>>>>>>>fs/nfs/nfs41_sessions.h        |   62 --------
>>>>>>>fs/nfs/nfs4_fs.h               |    2 +-
>>>>>>>fs/nfs/nfs4proc.c              |  326
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>+++++++++++++++++++++++++++++++++++-----
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>fs/nfs/nfs4xdr.c               |   24 ++--
>>>>>>>fs/nfs/super.c                 |    6 +-
>>>>>>>include/linux/nfs41_sessions.h |   72 +++++++++
>>>>>>>include/linux/nfs_xdr.h        |    4 +
>>>>>>>include/linux/nfsd/state.h     |    2 -
>>>>>>>8 files changed, 378 insertions(+), 120 deletions(-)  delete mode
>>>>>>>100644 fs/nfs/nfs41_sessions.h  create mode 100644 
>>>>>>>include/linux/nfs41_sessions.h
>>>>>>>
>>>>>>>diff --git a/fs/nfs/nfs41_sessions.h
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>b/fs/nfs/nfs41_sessions.h deleted
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>file mode 100644 index fc658c5..0000000
>>>>>>>--- a/fs/nfs/nfs41_sessions.h
>>>>>>>+++ /dev/null
>>>>>>>@@ -1,62 +0,0 @@
>>>>>>>-#ifndef __NFS4_1_SESSIONS_H__
>>>>>>>-#define __NFS4_1_SESSIONS_H__
>>>>>>>-
>>>>>>>-typedef unsigned char               sessionid_t[16];
>>>>>>>-typedef u32                 streamchannel_attrs;
>>>>>>>-typedef u32                 rdmachannel_attrs;
>>>>>>>-
>>>>>>>-struct nfs4_channel_attrs {
>>>>>>>-    unsigned long           max_rqst_sz;
>>>>>>>-    unsigned long           max_resp_sz;
>>>>>>>-    unsigned long           max_resp_sz_cached;
>>>>>>>-    unsigned long           max_ops;
>>>>>>>-    unsigned long           max_reqs;
>>>>>>>-    streamchannel_attrs     stream_attrs;
>>>>>>>-    rdmachannel_attrs       rdma_attrs;
>>>>>>>-};
>>>>>>>-
>>>>>>>-struct nfs4_channel {
>>>>>>>-    struct nfs4_channel_attrs       chan_attrs;
>>>>>>>-    unsigned long                   nr_conns;
>>>>>>>-    struct list_head                rpc_clients;
>>>>>>>-};
>>>>>>>-
>>>>>>>-struct nfs4_session {
>>>>>>>-    /* Session related params */
>>>>>>>-    sessionid_t             sess_id;
>>>>>>>-    u32                     seqid;  /* The seqid returned
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>by exchange_id */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-    u32                     persist;
>>>>>>>-    u32                     header_padding;
>>>>>>>-    u32                     hash_alg;
>>>>>>>-    u32                     ssv_len;
>>>>>>>-    u32                     use_for_back_chan;
>>>>>>>-    u32                     rdma_mode;
>>>>>>>-
>>>>>>>-    /* Slotid management */
>>>>>>>-    unsigned long           nr_slots_in_use;
>>>>>>>-    struct list_head        slots_in_use;
>>>>>>>-    struct list_head        unused_slots;
>>>>>>>-    struct rpc_wait_queue   slot_waitq;
>>>>>>>-
>>>>>>>-    /* The fore and back channel */
>>>>>>>-    struct nfs4_channel     fore_channel;
>>>>>>>-    struct nfs4_channel     back_channel;
>>>>>>>-
>>>>>>>-    unsigned int            expired;
>>>>>>>-    struct nfs4_client *    client;
>>>>>>>-    struct list_head        session_hashtbl;
>>>>>>>-    spinlock_t              session_lock;
>>>>>>>-    /* To prevent races between create_session and sequence */
>>>>>>>-    int                     mutating;
>>>>>>>-    struct semaphore        session_sem;
>>>>>>>-    atomic_t        ref_count;
>>>>>>>-};
>>>>>>>-
>>>>>>>-struct nfs4_slot {
>>>>>>>-    u32                     slot_nr;
>>>>>>>-    u32                     seq_nr;
>>>>>>>-    struct nfs4_session *   session;
>>>>>>>-    struct list_head        slot_list;
>>>>>>>-};
>>>>>>>-
>>>>>>>-#endif
>>>>>>>diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index 
>>>>>>>195d757..e50657e 100644
>>>>>>>--- a/fs/nfs/nfs4_fs.h
>>>>>>>+++ b/fs/nfs/nfs4_fs.h
>>>>>>>@@ -9,7 +9,7 @@
>>>>>>>#ifndef __LINUX_FS_NFS_NFS4_FS_H
>>>>>>>#define __LINUX_FS_NFS_NFS4_FS_H
>>>>>>>
>>>>>>>-#include "nfs41_sessions.h"
>>>>>>>+#include <linux/nfs41_sessions.h>
>>>>>>>
>>>>>>>#ifdef CONFIG_NFS_V4
>>>>>>>#define NFSV4_MAX_MINORVERSION 1
>>>>>>>diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index
>>>>>>>167ff4a..28513a1 100644
>>>>>>>--- a/fs/nfs/nfs4proc.c
>>>>>>>+++ b/fs/nfs/nfs4proc.c
>>>>>>>@@ -49,6 +49,7 @@
>>>>>>>#include <linux/namei.h>
>>>>>>>#include <linux/mount.h>
>>>>>>>#include <linux/module.h>
>>>>>>>+#include <linux/bitops.h>
>>>>>>>
>>>>>>>#include "nfs4_fs.h"
>>>>>>>#include "delegation.h"
>>>>>>>@@ -221,8 +222,10 @@ static int
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>nfs41_proc_sequence_done(struct nfs4_session *session, struct 
>>>>>>nfs41_s
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   unsigned long timestamp;
>>>>>>>   struct nfs4_client *clp;
>>>>>>>
>>>>>>>-    if (!session || !(clp = session->client))
>>>>>>>-            return 0;
>>>>>>>+    if (!session || !(clp = session->client)) {
>>>>>>>+            printk(KERN_EMERG "%s is NULL!!!\n",
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>(!session)?"session":"clp");
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+            BUG();
>>>>>>>+    }
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>For performance reasons the defensive code should only be active 
>>>>>>with kernel debugging.  The conditional statement above is always 
>>>>>>executed needlessly and is not as clean as it could be because of 
>>>>>>the assignment embedded in the boolean expression.  
>>>>>>            
>>>>>>
>>Therefore, this 
>>    
>>
>>>>>>better be coded like this:
>>>>>>
>>>>>>    BUG_ON(!session);
>>>>>>    clp = session->client;
>>>>>>    BUG_ON(!clp);
>>>>>>
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   if (!status) {
>>>>>>>           timestamp = jiffies;
>>>>>>>@@ -233,17 +236,131 @@ static int
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>nfs41_proc_sequence_done(struct nfs4_session *session, struct 
>>>>>>nfs41_s
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>           spin_unlock(&clp->cl_lock);
>>>>>>>   }
>>>>>>>
>>>>>>>+    /* Clear the 'busy' bit on the slot that was used */
>>>>>>>+    smp_mb__before_clear_bit();
>>>>>>>+    clear_bit(NFS4_SLOT_BUSY, &res->slot->flags);
>>>>>>>+    smp_mb__after_clear_bit();
>>>>>>>+
>>>>>>>+    printk(KERN_EMERG "waking up waiters on slot %d\n",
>>>>>>>+res->slot->slot_nr);
>>>>>>>+
>>>>>>>+    /* Wake up the threads waiting on this slot */
>>>>>>>+    wake_up_bit(&res->slot->flags, NFS4_SLOT_BUSY);
>>>>>>>+
>>>>>>>   return status;
>>>>>>>}
>>>>>>>
>>>>>>>+static int nfs4_wait_bit_interruptible(void *word) {
>>>>>>>+    if (signal_pending(current))
>>>>>>>+            return -ERESTARTSYS;
>>>>>>>+    schedule();
>>>>>>>+    return 0;
>>>>>>>+}
>>>>>>>+
>>>>>>>+/* Find the lowest numbered slot or sleep on the least
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>loaded slot */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+struct nfs4_slot *nfs4_find_slot(struct nfs4_channel *channel) {
>>>>>>>+    struct nfs4_slot_table *tbl;
>>>>>>>+    struct nfs4_slot *slot;
>>>>>>>+    struct nfs4_slot *target_slot;
>>>>>>>+    u32 max_slots;
>>>>>>>+    u32 min_waiters;
>>>>>>>+    int i;
>>>>>>>+    int need_to_sleep;
>>>>>>>+   
>>>>>>>+    might_sleep();
>>>>>>>+
>>>>>>>+    tbl = &channel->slot_table;
>>>>>>>+    min_waiters = tbl->slots[0].nr_waiters;
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>atomic_read()
>>>>>>
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+    target_slot = &tbl->slots[0];
>>>>>>>+
>>>>>>>+    spin_lock(&tbl->slot_tbl_lock);
>>>>>>>+    /* Make a local copy of max slots so we don't need to
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>hold it through
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+     * out.
>>>>>>>+     * XXX Will need to revalidate this if slots are reclaimed
>>>>>>>+     */
>>>>>>>+    max_slots = tbl->max_slots;
>>>>>>>+    spin_unlock(&tbl->slot_tbl_lock);
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Using a spinlock here is unreasonably expensive when you 
>>>>>>            
>>>>>>
>>could use 
>>    
>>
>>>>>>atomic counters since the lock overhead is not amortized over 
>>>>>>multiple accesses to the structure.
>>>>>>max_slots and nr_waiters should be defined as atomic_t and atomic 
>>>>>>operations be used to access them; Alternatively use one spinlock 
>>>>>>for the whole channel around the outer loop
>>>>>>
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+
>>>>>>>+    do {
>>>>>>>+            need_to_sleep = 1;
>>>>>>>+            for (i = 0; i <  max_slots; ++i) {
>>>>>>>+                    slot = &tbl->slots[i];
>>>>>>>+                    if (!test_and_set_bit(NFS4_SLOT_BUSY,
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>&slot->flags)){
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+                            /* We found an empty slot */
>>>>>>>+                            target_slot = slot;
>>>>>>>+                            need_to_sleep = 0;
>>>>>>>+                            break;
>>>>>>>+                    }
>>>>>>>+
>>>>>>>+                    spin_lock(&slot->slot_lock);
>>>>>>>+                    if (min_waiters > slot->nr_waiters) {
>>>>>>>+                            min_waiters = slot->nr_waiters;
>>>>>>>+                            target_slot = slot;
>>>>>>>+                    }
>>>>>>>+                    spin_unlock(&slot->slot_lock);
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>why lock? use atomic_read(&slot->nr_waiters) also, if several 
>>>>>>threads go through this code in parallel they will pick the same 
>>>>>>target_slot so you want to increment.
>>>>>>That's another reason to use a bigger lock.
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>Using atomic_t is a bit hard when you have to do compare 
>>>>>          
>>>>>
>>and set kind 
>>    
>>
>>>>>of operations...
>>>>>Like for instance:
>>>>>
>>>>>If (min_waiters > slot->nr_waiters) {
>>>>>     min_waiters = slot->nr_waiters;
>>>>>     target_slot = slot;
>>>>>}
>>>>>
>>>>>Doing the above atomically is hard with atomic variables 
>>>>>          
>>>>>
>>only because 
>>    
>>
>>>>>the whole sequence needs to be atomic. However, I can change the 
>>>>>spin_lock used to rwlock_t. This should better solve the 
>>>>>          
>>>>>
>>problem. Right?
>>    
>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>+                    printk(KERN_EMERG "slot %d has busy bit
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>%s and nr_waiters is %u\n"
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+                                    , slot->slot_nr,
>>>>>>>+                                    
>>>>>>>              
>>>>>>>
>>(need_to_sleep)?"set":"unset",
>>    
>>
>>>>>>>+                                    slot->nr_waiters);
>>>>>>>+            }
>>>>>>>+   
>>>>>>>+            /* Check whether we need to sleep. If so, sleep
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>on the BUSY bit
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+             * Increment the nr_waiters before sleeping and
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>decrement it
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+             * when woken up
>>>>>>>+             */
>>>>>>>+            if (need_to_sleep) {
>>>>>>>+                    printk(KERN_EMERG "sleeping on slot %d;
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>seq_nr: %d\n",
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+                                    target_slot->slot_nr,
>>>>>>>+                                    target_slot->seq_nr);
>>>>>>>+                    spin_lock(&target_slot->slot_lock);
>>>>>>>+                    ++target_slot->nr_waiters;
>>>>>>>+                    spin_unlock(&target_slot->slot_lock);
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>atomic_inc()
>>>>>>
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+   
>>>>>>>+                    /* XXX: We need to check the return value of
>>>>>>>+                     * eait_on_bit so that we can check if it's
>>>>>>>+                     * interrupted.
>>>>>>>+                     */
>>>>>>>+                    wait_on_bit(&target_slot->flags,
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>NFS4_SLOT_BUSY,
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+                                    nfs4_wait_bit_interruptible,
>>>>>>>+                                    TASK_INTERRUPTIBLE);
>>>>>>>+
>>>>>>>+                    spin_lock(&target_slot->slot_lock);
>>>>>>>+                    --target_slot->nr_waiters;
>>>>>>>+                    spin_unlock(&target_slot->slot_lock);
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>atomic_dec()
>>>>>>
>>>>>>But wait a second, why do you want a queue per slot to begin with?
>>>>>>It seems like you can just go with a semaphore initialized to the 
>>>>>>number of slots, do down_interruptible before scanning for a free 
>>>>>>slot and up in nfs41_proc_sequence_done. Take a spin lock to scan 
>>>>>>the slot table (last index can be saved and rotated for 
>>>>>>            
>>>>>>
>>uniformity) 
>>    
>>
>>>>>>and if not free slot is found do a
>>>>>>BUG() since the semaphore should prevent that from happening;
>>>>>>
>>>>>>
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+            }
>>>>>>>+            else
>>>>>>>+                    break;
>>>>>>>+    }while (test_and_set_bit(NFS4_SLOT_BUSY, 
>>>>>>>+ &target_slot->flags));
>>>>>>>+
>>>>>>>+    printk(KERN_EMERG "slot id: %u\nseqid: %u\n 
>>>>>>>              
>>>>>>>
>>max_slots: %u\n",
>>    
>>
>>>>>>>+                    target_slot->slot_nr, target_slot->seq_nr,
>>>>>>>+                            max_slots);
>>>>>>>+   
>>>>>>>+    return target_slot;
>>>>>>>+}
>>>>>>>+   
>>>>>>>static int _nfs41_proc_setup_sequence(struct nfs4_session
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>*session,
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>struct nfs41_sequence_args *args, struct 
>>>>>>>              
>>>>>>>
>>nfs41_sequence_res *res)  {
>>    
>>
>>>>>>>   u32 *ptr;
>>>>>>>-
>>>>>>>+    struct nfs4_slot *slot;
>>>>>>>+   
>>>>>>>   ptr = (u32 *)session->sess_id;
>>>>>>>   dprintk("%s: %u:%u:%u:%u\n", __FUNCTION__, ptr[0],
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>ptr[1], ptr[2],
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>ptr[3]);
>>>>>>>
>>>>>>>-    memcpy(args->sessionid, (unsigned char
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>*)session->sess_id, NFS4_MAX_SESSIONID_LEN);
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+    memcpy(args->sessionid, (unsigned char *)session->sess_id,
>>>>>>>+                                    NFS4_MAX_SESSIONID_LEN);
>>>>>>>+
>>>>>>>+       slot = nfs4_find_slot(&session->fore_channel);
>>>>>>>+
>>>>>>>+       /* XXX: Do we always increment this? Are there any
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>errors for which we
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+    * don't increment the sequence number?
>>>>>>>+    */
>>>>>>>+       args->seqid = slot->seq_nr++;
>>>>>>>+       args->slotid = slot->slot_nr;
>>>>>>>+       args->maxslots = 
>>>>>>>+ session->fore_channel.slot_table.max_slots;
>>>>>>>+     
>>>>>>>+       res->slot = slot;
>>>>>>>
>>>>>>>   return 0;
>>>>>>>}
>>>>>>>@@ -1685,11 +1802,6 @@ static int nfs4_proc_get_root(struct
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>nfs_server *server, struct nfs_fh *fhandle,
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   };
>>>>>>>   int status;
>>>>>>>
>>>>>>>-    if (server->rpc_ops->setup_sequence && (status =
>>>>>>>-       
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>server->rpc_ops->setup_sequence(server->nfs4_state->cl_session,
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-        &seqargs, &seqres)))
>>>>>>>-            return status;
>>>>>>>-
>>>>>>>   /*
>>>>>>>    * Now we do a separate LOOKUP for each component of
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>the mount path.
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>    * The LOOKUPs are done separately so that we can
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>conveniently @@
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-1714,9 +1826,21 @@ static int nfs4_proc_get_root(struct 
>>>>>>>              
>>>>>>>
>>nfs_server 
>>    
>>
>>>>>>>*server, struct nfs_fh *fhandle,
>>>>>>>
>>>>>>>           do {
>>>>>>>                   nfs_fattr_init(fattr);
>>>>>>>+                    if (server->rpc_ops->setup_sequence &&
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>(status =
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+                            server->rpc_ops->setup_sequence(
>>>>>>>+                            server->nfs4_state->cl_session,
>>>>>>>+                            &seqargs, &seqres)))
>>>>>>>+                            return status;
>>>>>>>+                   
>>>>>>>                   status = nfs4_handle_exception(server,
>>>>>>>                                  
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>rpc_call_sync(server->client, &msg, 0),
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>                                   &exception);
>>>>>>>+
>>>>>>>+                    if (server->rpc_ops->sequence_done)
>>>>>>>+                            server->rpc_ops->sequence_done(
>>>>>>>+                                           
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>server->nfs4_state->cl_session,
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+                                            &seqres, status);
>>>>>>>+
>>>>>>>           } while (exception.retry);
>>>>>>>           if (status == 0)
>>>>>>>                   continue;
>>>>>>>@@ -1731,10 +1855,6 @@ static int nfs4_proc_get_root(struct
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>nfs_server *server, struct nfs_fh *fhandle,
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   if (status == 0)
>>>>>>>           status = nfs4_do_fsinfo(server, fhandle, info);
>>>>>>>out:
>>>>>>>-    if (server->rpc_ops->sequence_done)
>>>>>>>-           
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>server->rpc_ops->sequence_done(server->nfs4_state->cl_session,
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-                                            &seqres, status);
>>>>>>>-
>>>>>>>   return nfs4_map_errors(status); }
>>>>>>>
>>>>>>>@@ -3268,6 +3388,7 @@ int nfs4_proc_async_sequence(struct
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>nfs4_client *clp, struct rpc_cred *cred)
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   ret = rpc_call_async(clp->cl_rpcclient, &msg, RPC_TASK_SOFT,
>>>>>>>   &nfs4_sequence_ops, (void *)jiffies);
>>>>>>>
>>>>>>>+        nfs41_proc_sequence_done(clp->cl_session, res, ret);
>>>>>>>out:
>>>>>>>   return ret;
>>>>>>>out_free:
>>>>>>>@@ -3581,14 +3702,6 @@ nfs4_async_handle_error(struct
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>rpc_task *task, const struct nfs_server *server)
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   return 0;
>>>>>>>}
>>>>>>>
>>>>>>>-static int nfs4_wait_bit_interruptible(void *word) -{
>>>>>>>-    if (signal_pending(current))
>>>>>>>-            return -ERESTARTSYS;
>>>>>>>-    schedule();
>>>>>>>-    return 0;
>>>>>>>-}
>>>>>>>-
>>>>>>>static int nfs4_wait_clnt_recover(struct rpc_clnt *clnt, struct 
>>>>>>>nfs4_client *clp)  {
>>>>>>>   sigset_t oldset;
>>>>>>>@@ -3823,6 +3936,111 @@ int nfs4_proc_get_lease_time(struct
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>nfs4_client *clp, struct nfs_fsinfo *fsinfo)
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   return status;
>>>>>>>}
>>>>>>>
>>>>>>>+/* Initialize a slot table */
>>>>>>>+int nfs4_init_slot_table(struct nfs4_channel *channel) {
>>>>>>>+    int i;
>>>>>>>+    struct nfs4_slot_table *tbl;
>>>>>>>+    struct nfs4_slot *slot;
>>>>>>>+   
>>>>>>>+    tbl = &channel->slot_table;
>>>>>>>+    tbl->max_slots = channel->chan_attrs.max_reqs;
>>>>>>>+
>>>>>>>+    tbl->slots = kzalloc(tbl->max_slots * sizeof(struct
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>nfs4_slot), GFP_ATOMIC);
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+    if (!tbl->slots)
>>>>>>>+            return -ENOMEM;
>>>>>>>+
>>>>>>>+    spin_lock_init(&tbl->slot_tbl_lock);
>>>>>>>+
>>>>>>>+    for (i = 0; i < tbl->max_slots; ++i) {
>>>>>>>+            slot = &tbl->slots[i];
>>>>>>>+
>>>>>>>+            slot->slot_nr = i;
>>>>>>>+            slot->seq_nr = 1;
>>>>>>>+            slot->flags = 0;
>>>>>>>+            slot->nr_waiters = 0;
>>>>>>>+            spin_lock_init(&slot->slot_lock);
>>>>>>>+    }
>>>>>>>+
>>>>>>>+    return 0;
>>>>>>>+}
>>>>>>>+
>>>>>>>+/* Destroy the slot table */
>>>>>>>+void nfs4_destroy_slot_table(struct nfs4_channel *channel) {
>>>>>>>+    int i;
>>>>>>>+    struct nfs4_slot *slot;
>>>>>>>+    struct nfs4_slot_table *tbl;
>>>>>>>+
>>>>>>>+    tbl = &channel->slot_table;
>>>>>>>+   
>>>>>>>+    for (i = 0; i < tbl->max_slots;++i) {
>>>>>>>+            slot = &tbl->slots[i];
>>>>>>>+
>>>>>>>+            if (slot->nr_waiters)
>>>>>>>+                    BUG();
>>>>>>>+    }
>>>>>>>+
>>>>>>>+    kfree(channel->slot_table.slots);
>>>>>>>+    channel->slot_table.slots = NULL;
>>>>>>>+
>>>>>>>+    return;
>>>>>>>+}
>>>>>>>+
>>>>>>>+/* dump the channel attributes */
>>>>>>>+void nfs4_dump_channel_attrs(struct nfs4_channel_attrs *attrs) {
>>>>>>>+    printk(KERN_INFO "max_rqst_sz: %u\n", attrs->max_rqst_sz);
>>>>>>>+    printk(KERN_INFO "max_resp_sz: %u\n", attrs->max_resp_sz);
>>>>>>>+    printk(KERN_INFO "max_resp_sz_cached: %u\n",
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>attrs->max_resp_sz_cached);
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+    printk(KERN_INFO "max_ops: %u\n", attrs->max_ops);
>>>>>>>+    printk(KERN_INFO "max_reqs: %u\n", attrs->max_reqs); }
>>>>>>>+
>>>>>>>+/* Initialize the values to be used by the client in
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>CREATE_SESSION
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+*/ void nfs4_init_channel_attrs(struct nfs4_client *clp,
>>>>>>>+                            struct nfs4_channel_attrs *fc_attrs,
>>>>>>>+                            struct nfs4_channel_attrs 
>>>>>>>              
>>>>>>>
>>*bc_attrs) {
>>    
>>
>>>>>>>+    /* XXX: We need to have good values here... 32K is a
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>wild guess */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+    fc_attrs->max_rqst_sz = bc_attrs->max_rqst_sz = 32768;
>>>>>>>+    fc_attrs->max_resp_sz = bc_attrs->max_resp_sz = 32768;
>>>>>>>+    fc_attrs->max_resp_sz_cached =
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>bc_attrs->max_resp_sz_cached = 32768;
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+    fc_attrs->max_ops = bc_attrs->max_ops = 0xFFFFFFFF;
>>>>>>>+    fc_attrs->max_reqs = bc_attrs->max_reqs =
>>>>>>>+                            
>>>>>>>              
>>>>>>>
>>clp->cl_rpcclient->cl_xprt->max_reqs;
>>    
>>
>>>>>>>+    fc_attrs->stream_attrs = bc_attrs->stream_attrs = 0;
>>>>>>>+    fc_attrs->rdma_attrs = bc_attrs->rdma_attrs = 0;
>>>>>>>+
>>>>>>>+}
>>>>>>>+
>>>>>>>+/* Check the values returned by the server for
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>CREATE_SESSION. Since
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+we made
>>>>>>>+ * our needs known, if the server gives us more than we 
>>>>>>>              
>>>>>>>
>>need, we 
>>    
>>
>>>>>>>+don't bother
>>>>>>>+ * with it.
>>>>>>>+ */
>>>>>>>+void nfs4_adjust_channel_attrs(struct nfs4_channel_attrs
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>*req_attrs,
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+                            struct nfs4_channel_attrs 
>>>>>>>              
>>>>>>>
>>*resp_attrs) {
>>    
>>
>>>>>>>+    if (req_attrs->max_rqst_sz < resp_attrs->max_rqst_sz)
>>>>>>>+            resp_attrs->max_rqst_sz = req_attrs->max_rqst_sz;
>>>>>>>+
>>>>>>>+    if (req_attrs->max_resp_sz < resp_attrs->max_resp_sz)
>>>>>>>+            resp_attrs->max_resp_sz = req_attrs->max_resp_sz;
>>>>>>>+
>>>>>>>+    if (req_attrs->max_resp_sz_cached < resp_attrs->max_resp_sz)
>>>>>>>+            resp_attrs->max_resp_sz = 
>>>>>>>+ req_attrs->max_resp_sz_cached;
>>>>>>>+
>>>>>>>+    if (req_attrs->max_ops < resp_attrs->max_ops)
>>>>>>>+            resp_attrs->max_ops = req_attrs->max_ops;
>>>>>>>+   
>>>>>>>+    if (req_attrs->max_reqs < resp_attrs->max_reqs)
>>>>>>>+            resp_attrs->max_reqs = req_attrs->max_reqs;
>>>>>>>+
>>>>>>>+    /* XXX: We ignore the stream channel attributes... we
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>have no idea what
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+     * to do with them anyways!
>>>>>>>+     */
>>>>>>>+}
>>>>>>>+
>>>>>>>int _nfs4_proc_create_session(struct nfs4_client *clp, struct 
>>>>>>>nfs4_session *session, struct rpc_clnt *clnt)  {
>>>>>>>   struct nfs41_create_session_args args = { @@ -3845,7
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>+4063,19 @@ int
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>_nfs4_proc_create_session(struct nfs4_client *clp, struct
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>nfs4_session *sess
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   };
>>>>>>>   int status;
>>>>>>>
>>>>>>>-    status = rpc_call_sync(clnt, &msg, 0);
>>>>>>>+    nfs4_init_channel_attrs(clp, &args.fc_attrs, 
>>>>>>>              
>>>>>>>
>>&args.bc_attrs);
>>    
>>
>>>>>>>+   
>>>>>>>+        status = rpc_call_sync(clnt, &msg, 0);
>>>>>>>+
>>>>>>>+    /* Set the negotiated values in the session's
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>channel_attrs struct
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+*/
>>>>>>>+
>>>>>>>+    if (!status) {
>>>>>>>+            nfs4_adjust_channel_attrs(&args.fc_attrs,
>>>>>>>+                                   
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>&session->fore_channel.chan_attrs);
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+            nfs4_adjust_channel_attrs(&args.bc_attrs,
>>>>>>>+                                   
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>&session->back_channel.chan_attrs);
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+    }
>>>>>>>+   
>>>>>>>   return status;
>>>>>>>}
>>>>>>>EXPORT_SYMBOL(_nfs4_proc_create_session);
>>>>>>>@@ -3853,29 +4083,28 @@ EXPORT_SYMBOL(_nfs4_proc_create_session);
>>>>>>>int nfs4_proc_create_session(struct nfs4_client *clp)  {
>>>>>>>   int status;
>>>>>>>-    unsigned long now;
>>>>>>>   struct nfs4_session *session;
>>>>>>>-    struct nfs_fsinfo fsinfo;
>>>>>>>   u32 *ptr;
>>>>>>>
>>>>>>>-    now = jiffies;
>>>>>>>-
>>>>>>>   status = _nfs4_proc_create_session(clp,
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>clp->cl_session, clp->cl_rpcclient);
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   if (status)
>>>>>>>           return status;
>>>>>>>
>>>>>>>-    status = nfs4_proc_get_lease_time(clp, &fsinfo);
>>>>>>>-    if (status == 0) {
>>>>>>>-            /* Update lease time and schedule renewal */
>>>>>>>-            spin_lock(&clp->cl_lock);
>>>>>>>-            clp->cl_lease_time = fsinfo.lease_time * HZ;
>>>>>>>-            clp->cl_last_renewal = now;
>>>>>>>-            clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
>>>>>>>-            spin_unlock(&clp->cl_lock);
>>>>>>>+    session = clp->cl_session;
>>>>>>>
>>>>>>>-            nfs4_schedule_state_renewal(clp);
>>>>>>>+    /* Init the fore channel */
>>>>>>>+    status = nfs4_init_slot_table(&session->fore_channel);
>>>>>>>+    dprintk("fc init returned %d\n", status);
>>>>>>>+    if (status)
>>>>>>>+            return status;
>>>>>>>+
>>>>>>>+    /* Init the back channel */
>>>>>>>+    status = nfs4_init_slot_table(&session->back_channel);
>>>>>>>+    dprintk("bc init returned %d\n", status);
>>>>>>>+    if (status) {
>>>>>>>+            nfs4_destroy_slot_table(&session->fore_channel);
>>>>>>>+            return status;
>>>>>>>   }
>>>>>>>-    session = clp->cl_session;
>>>>>>>
>>>>>>>   ptr = (int *)session->sess_id;
>>>>>>>   dprintk("sessionid is: %d:%d:%d:%d\n", ptr[0], 
>>>>>>>              
>>>>>>>
>>ptr[1], ptr[2], 
>>    
>>
>>>>>>>ptr[3]); @@ -4769,11 +4998,8 @@ struct nfs4_session
>>>>>>>*nfs41_alloc_session(void)
>>>>>>>
>>>>>>>   session->expired = 1;
>>>>>>>
>>>>>>>-    INIT_LIST_HEAD(&session->slots_in_use);
>>>>>>>-    INIT_LIST_HEAD(&session->unused_slots);
>>>>>>>   INIT_LIST_HEAD(&session->session_hashtbl);
>>>>>>>
>>>>>>>-    //rpc_init_wait_queue(&session->slot_waitq, "Slot 
>>>>>>>              
>>>>>>>
>>waitqueue");
>>    
>>
>>>>>>>   spin_lock_init(&session->session_lock);
>>>>>>>
>>>>>>>   sema_init(&session->session_sem, 1); @@ -4797,6 
>>>>>>>              
>>>>>>>
>>+5023,8 @@ void 
>>    
>>
>>>>>>>nfs4_get_session(struct nfs4_session *session)  void 
>>>>>>>nfs4_put_session(struct nfs4_session **session)  {
>>>>>>>   if (atomic_dec_and_test(&((*session)->ref_count))) {
>>>>>>>+            
>>>>>>>              
>>>>>>>
>>nfs4_destroy_slot_table(&((*session)->fore_channel));
>>    
>>
>>>>>>>+            
>>>>>>>              
>>>>>>>
>>nfs4_destroy_slot_table(&((*session)->back_channel));
>>    
>>
>>>>>>>           nfs41_free_session(*session);
>>>>>>>           *session = NULL;
>>>>>>>   }
>>>>>>>@@ -4806,7 +5034,11 @@ int nfs41_proc_setup_session(struct
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>nfs4_client
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>*clp)  {
>>>>>>>   int status;
>>>>>>>
>>>>>>>-    dprintk("in %s!\n", __FUNCTION__);
>>>>>>>+        struct nfs_fsinfo fsinfo;
>>>>>>>+    unsigned long now;
>>>>>>>+   
>>>>>>>+    now = jiffies;
>>>>>>>+
>>>>>>>   if (!clp->cl_session) {
>>>>>>>           /* create the session struct to hold the
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>session parameters */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>           clp->cl_session = nfs41_alloc_session(); @@
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>-4844,6 +5076,20 @@ int
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>nfs41_proc_setup_session(struct nfs4_client *clp)
>>>>>>>   clp->cl_session->expired = 0;
>>>>>>>   clp->cl_session->client = clp;
>>>>>>>
>>>>>>>+    status = nfs4_proc_get_lease_time(clp, &fsinfo);
>>>>>>>+
>>>>>>>+    if (status)
>>>>>>>+            goto out_free;
>>>>>>>+   
>>>>>>>+    /* Update lease time and schedule renewal */
>>>>>>>+    spin_lock(&clp->cl_lock);
>>>>>>>+    clp->cl_lease_time = fsinfo.lease_time * HZ;
>>>>>>>+    clp->cl_last_renewal = now;
>>>>>>>+    clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
>>>>>>>+    spin_unlock(&clp->cl_lock);
>>>>>>>+
>>>>>>>+    nfs4_schedule_state_renewal(clp);
>>>>>>>+   
>>>>>>>   clp->cl_session->mutating = 0;
>>>>>>>out:
>>>>>>>   up(&clp->cl_session->session_sem);
>>>>>>>diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index 
>>>>>>>5ef6d70..7da9a6a 100644
>>>>>>>--- a/fs/nfs/nfs4xdr.c
>>>>>>>+++ b/fs/nfs/nfs4xdr.c
>>>>>>>@@ -1429,20 +1429,22 @@ static int
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>encode_create_session(struct xdr_stream *xdr, struct 
>>>>>>nfs41_create_ses
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   RESERVE_SPACE(2*28);                    /* 2 channel_attrs */
>>>>>>>
>>>>>>>   /* Fore Channel */
>>>>>>>-    WRITE32(32768);                         /* max req size */
>>>>>>>-    WRITE32(32768);                         /* max resp size */
>>>>>>>-    WRITE32(32768);                         /* Max resp
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>size cached */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-    WRITE32(RPC_DEF_SLOT_TABLE);            /* max operations */
>>>>>>>-    WRITE32(RPC_DEF_SLOT_TABLE);            /* max requests */
>>>>>>>+    WRITE32(args->fc_attrs.max_rqst_sz);    /* max req size */
>>>>>>>+    WRITE32(args->fc_attrs.max_resp_sz);    /* max resp size */
>>>>>>>+    WRITE32(args->fc_attrs.max_resp_sz_cached);     /* Max
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>resp sz cached */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+    WRITE32(args->fc_attrs.max_ops);        /* max operations */
>>>>>>>+    WRITE32(args->fc_attrs.max_reqs);       /* max requests */
>>>>>>>+
>>>>>>>   WRITE32(0);                             /*
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Streamchannel attrs */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   WRITE32(0);                             
>>>>>>>              
>>>>>>>
>>/*rdmachannel_attrs */
>>    
>>
>>>>>>>   /* Back Channel */
>>>>>>>-    WRITE32(32768);                         /* max req size */
>>>>>>>-    WRITE32(32768);                         /* max resp size */
>>>>>>>-    WRITE32(32768);                         /* Max resp
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>size cached */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-    WRITE32(RPC_DEF_SLOT_TABLE);            /* max operations */
>>>>>>>-    WRITE32(RPC_DEF_SLOT_TABLE);            /* max requests */
>>>>>>>+    WRITE32(args->bc_attrs.max_rqst_sz);    /* max req size */
>>>>>>>+    WRITE32(args->bc_attrs.max_resp_sz);    /* max resp size */
>>>>>>>+    WRITE32(args->bc_attrs.max_resp_sz_cached);     /* Max
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>resp sz cached */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+    WRITE32(args->bc_attrs.max_ops);        /* max operations */
>>>>>>>+    WRITE32(args->bc_attrs.max_reqs);       /* max requests */
>>>>>>>+
>>>>>>>   WRITE32(0);                             /*
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Streamchannel attrs */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>   WRITE32(0);                             
>>>>>>>              
>>>>>>>
>>/*rdmachannel_attrs */
>>    
>>
>>>>>>>@@ -6404,7 +6406,7 @@ static int
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>nfs41_xdr_dec_get_lease_time(struct rpc_rqst *rqstp, uint32_t *p, 
>>>>>>str
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>           status = decode_putrootfh(&xdr);
>>>>>>>   if (!status)
>>>>>>>           status = decode_fsinfo(&xdr, res->fsinfo);
>>>>>>>-    if (!status)
>>>>>>>+    if (status)
>>>>>>>           status = -nfs_stat_to_errno(hdr.status);
>>>>>>>   return status;
>>>>>>>}
>>>>>>>diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 
>>>>>>>              
>>>>>>>
>>a3bc630..89eb564
>>    
>>
>>>>>>>100644
>>>>>>>--- a/fs/nfs/super.c
>>>>>>>+++ b/fs/nfs/super.c
>>>>>>>@@ -1260,15 +1260,13 @@ static int nfs4_fill_super(struct
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>super_block *sb, struct nfs4_mount_data *data,
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>               server->nfs4_state->cl_minorversion = i;
>>>>>>>
>>>>>>>               if (server->rpc_ops->setup_session) {
>>>>>>>-                        int status;
>>>>>>>-
>>>>>>>                       lock_kernel();
>>>>>>>                       down_write(&server->nfs4_state->cl_sem);
>>>>>>>-                        status = 
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>server->rpc_ops->setup_session(server->nfs4_state);
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+                        err =
>>>>>>>+ server->rpc_ops->setup_session(server->nfs4_state);
>>>>>>>                       up_write(&server->nfs4_state->cl_sem);
>>>>>>>                       unlock_kernel();
>>>>>>>
>>>>>>>-                        if (status) {
>>>>>>>+                        if (err) {
>>>>>>>                               printk(KERN_EMERG
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>"Couldn't mount using minorversion %d\n", i);
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>                              
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>rpc_shutdown_client(server->client);
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>                       }
>>>>>>>diff --git a/include/linux/nfs41_sessions.h 
>>>>>>>b/include/linux/nfs41_sessions.h new file mode 100644 index 
>>>>>>>0000000..02f45db
>>>>>>>--- /dev/null
>>>>>>>+++ b/include/linux/nfs41_sessions.h
>>>>>>>@@ -0,0 +1,72 @@
>>>>>>>+#ifndef __NFS4_1_SESSIONS_H__
>>>>>>>+#define __NFS4_1_SESSIONS_H__
>>>>>>>+
>>>>>>>+/* The flags for the nfs4_slot struct */
>>>>>>>+#define NFS4_SLOT_BUSY              0X0     /* Slot in use */
>>>>>>>+#define NFS4_SLOT_RECLAIMED 0x1     /* Slot has
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>been reclaimed by
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+                                       the server */
>>>>>>>+
>>>>>>>+typedef unsigned char               sessionid_t[16];
>>>>>>>+typedef u32                 streamchannel_attrs;
>>>>>>>+typedef u32                 rdmachannel_attrs;
>>>>>>>+
>>>>>>>+struct nfs4_channel_attrs {
>>>>>>>+    u32                     max_rqst_sz;
>>>>>>>+    u32                     max_resp_sz;
>>>>>>>+    u32                     max_resp_sz_cached;
>>>>>>>+    u32                     max_ops;
>>>>>>>+    u32                     max_reqs;
>>>>>>>+    streamchannel_attrs     stream_attrs;
>>>>>>>+    rdmachannel_attrs       rdma_attrs;
>>>>>>>+};
>>>>>>>+
>>>>>>>+struct nfs4_slot {
>>>>>>>+    u32                     slot_nr;
>>>>>>>+    u32                     seq_nr;
>>>>>>>+    unsigned long           flags;
>>>>>>>+    u32                     nr_waiters;
>>>>>>>+    spinlock_t              slot_lock;
>>>>>>>+};
>>>>>>>+
>>>>>>>+struct nfs4_slot_table {
>>>>>>>+    struct nfs4_slot        *slots;
>>>>>>>+    u32                     max_slots;
>>>>>>>+    spinlock_t              slot_tbl_lock;
>>>>>>>+};
>>>>>>>+
>>>>>>>+struct nfs4_channel {
>>>>>>>+    struct nfs4_channel_attrs       chan_attrs;
>>>>>>>+    struct rpc_clnt                 *rpc_client;
>>>>>>>+    struct nfs4_slot_table          slot_table;
>>>>>>>+};
>>>>>>>+
>>>>>>>+struct nfs4_session {
>>>>>>>+    /* Session related params */
>>>>>>>+    sessionid_t                     sess_id;
>>>>>>>+    u32                             seqid;  /* The seqid
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>returned by
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+                                               exchange_id */
>>>>>>>+    u32                             persist;
>>>>>>>+    u32                             header_padding;
>>>>>>>+    u32                             hash_alg;
>>>>>>>+    u32                             ssv_len;
>>>>>>>+    u32                             use_for_back_chan;
>>>>>>>+    u32                             rdma_mode;
>>>>>>>+
>>>>>>>+    /* The fore and back channel */
>>>>>>>+    struct nfs4_channel             fore_channel;
>>>>>>>+    struct nfs4_channel             back_channel;
>>>>>>>+
>>>>>>>+    unsigned int                    expired;
>>>>>>>+    struct nfs4_client *            client;
>>>>>>>+    struct list_head                session_hashtbl;
>>>>>>>+    spinlock_t                      session_lock;
>>>>>>>+    /* To prevent races between create_session and sequence */
>>>>>>>+    int                             mutating;
>>>>>>>+    struct semaphore                session_sem;
>>>>>>>+    atomic_t                        ref_count;
>>>>>>>+};
>>>>>>>+
>>>>>>>+
>>>>>>>+#endif
>>>>>>>+
>>>>>>>+
>>>>>>>diff --git a/include/linux/nfs_xdr.h
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>b/include/linux/nfs_xdr.h index
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>4648707..52b9e38 100644
>>>>>>>--- a/include/linux/nfs_xdr.h
>>>>>>>+++ b/include/linux/nfs_xdr.h
>>>>>>>@@ -3,6 +3,7 @@
>>>>>>>
>>>>>>>#include <linux/sunrpc/xprt.h>
>>>>>>>#include <linux/nfsacl.h>
>>>>>>>+#include <linux/nfs41_sessions.h>
>>>>>>>
>>>>>>>/*
>>>>>>>* To change the maximum rsize and wsize supported by the
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>NFS client,
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>adjust @@ -778,6 +779,8 @@ struct nfs41_create_session_args {
>>>>>>>   uint32_t                        use_for_backchannel;
>>>>>>>   uint32_t                        use_for_rdma;
>>>>>>>   uint32_t                        cb_program;
>>>>>>>+    struct nfs4_channel_attrs       fc_attrs;       /* Fore
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Channel */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>+    struct nfs4_channel_attrs       bc_attrs;       /* Back
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Channel */
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>};
>>>>>>>
>>>>>>>struct nfs41_create_session_res {
>>>>>>>@@ -805,6 +808,7 @@ struct nfs41_sequence_res {
>>>>>>>   u32                             target_maxslots;
>>>>>>>   u32                             status_flags;
>>>>>>>   struct nfs4_state_owner         *sp;
>>>>>>>+    struct nfs4_slot                *slot;
>>>>>>>};
>>>>>>>
>>>>>>>struct nfs4_get_lease_time_args {
>>>>>>>diff --git a/include/linux/nfsd/state.h
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>b/include/linux/nfsd/state.h
>>>>>> 
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>index 8f699b4..fde667e 100644
>>>>>>>--- a/include/linux/nfsd/state.h
>>>>>>>+++ b/include/linux/nfsd/state.h
>>>>>>>@@ -47,8 +47,6 @@ typedef struct {
>>>>>>>   u32             cl_id;
>>>>>>>} clientid_t;
>>>>>>>
>>>>>>>-typedef unsigned char sessionid_t[16];
>>>>>>>-
>>>>>>>typedef struct {
>>>>>>>   u32             so_boot;
>>>>>>>   u32             so_stateownerid;
>>>>>>>   
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>_______________________________________________
>>>pNFS mailing list
>>>pNFS at linux-nfs.org
>>>http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>>> 
>>>
>>>      
>>>



More information about the pNFS mailing list