[pnfs] [PATCH 1/2] Slot Table Implementation

Benny Halevy bhalevy at panasas.com
Sun Mar 18 13:27:06 EDT 2007


Hmm, keeping the free list in order has the same O(n) complexity as 
scanning for the
first free slot.  We just need a fast way to tell if there are any free 
slots (a list, or a counter)
for the queue sleep condition.  A single spin_lock for scanning the slot 
table (or for the list)
should more efficient than a test_and_set bit per slot (when there's 
enough load) and so is
a single queue rather than a queue per slot.  The latter is crucial for 
getting to the first slot
that becomes available and not waiting on average half the response time 
in the system.

Benny

Iyer, Rahul wrote:

> Hi Benny,
> My first idea was similar to this. That was the reason I had the 
> unused slots and used slots lists. The reason I abandoned that was 
> because it would be difficult to get the lowest numbered slot. The 
> draft suggests that the client should use the lowest numbered slot so 
> that the server can reclaim slots from the client if it sees that they 
> are under-utilized (at least that's what I understand).It is not a 
> hard requirement, but one that is recommended.
> >From draft 10, Section 2.4.10.1, Paragraph 12
> "The requester is required to use the lowest available slot when 
> issuing a new request. This way, the replier may be able to retire 
> slot entries faster."
>
> Further, this email exchange adds some more insight:
> http://www1.ietf.org/mail-archive/web/nfsv4/current/msg03997.html
>
> The problem I foresaw with the lists is that keeping the lists ordered 
> is almost as much work as having separate lists. As for the atomic_t 
> variables, I'm not too hot about the atomic variables. test_and_set is 
> a bus locking instruction and so isn't such a great idea... the 
> spin_locks are "optimized for the fast path", which is getting the 
> spin_locks, but they're too much of a big hammer maybe. rw_locks might 
> be a better idea, but i'm not 100% convinced... I'm pretty sure they 
> use some sort of bus locking instruction too. As for the current 
> implementation, scaling shouldn't be too much of a problem as the 
> number of slots is likely to be small (few tens) and the worst case is 
> bounded. I agree it is linear time, but at least the worst case 
> doesn't run away from us. Since the RPC slots is a tunable parameter, 
> it would be interesting to see what an "optimal" rpc slot count is... 
> as in at what point does adding more slots make no difference to 
> performance. Is there such a study?
>
> Another idea I can think of is to have a single lock that protects the 
> whole slot table. We grab the lock and run down the slot table to 
> select our slot (and set need_to_sleep if required), and then release 
> the lock. Then, if need_to_sleep is set, we grab the lock, increment 
> nr_waiters for the slot, release the lock and sleep.
> Should this be a spin_lock or an rw_lock, I'm not sure... spin_locks 
> will harm the concurrency because it would lock out parallel reads. On 
> the other hand, rw_locks are biased towards readers or writers (not 
> sure which for Linux), so that may not be a good thing.
>
> That being said, I'm not wedded to the current implementation in any 
> way... comments and better algorithms are welcome. This was just one 
> way that I thought made sense.
> Thanks,
> Regards
> Rahul
>
>
> -----Original Message-----
> From: Benny Halevy [mailto:bhalevy at panasas.com]
> Sent: Sat 3/17/2007 3:54 AM
> To: Iyer, Rahul
> Cc: pnfs at linux-nfs.org
> Subject: Re: [pnfs] [PATCH 1/2] Slot Table Implementation
>
> Rahul, this is not urgent but I really think we should move to a single
> queue for several reason.  With your implementation when all slots
> are used you have to do max_slots test_and_set operations to
> discover that.  This will just not scale.   I agree with your point
> about semaphores and renegotiating the number of slots so how
> about this (sketch) instead:
>
> nfs41_proc_sequence_done()
> {
>         int need_wakeup = 0;
>
>         spin_lock(&res->channel->channel_lock);
>         list_add(&res->slot->slot_list, &res->channel->unused_slots);
>         spin_unlock(&res->channel->channel_lock);
>
>         /* Wake up the threads waiting on this session */
>         rpc_wake_up(&res->channel->sl_waitq);
>
>         return status;
>  }
>
> struct nfs4_slot *nfs4_find_slot(struct nfs4_channel *channel)
> {
>         struct nfs4_slot *slot;
>        
>         might_sleep();
>
>         for (;;) {
>                 int ret = wait_event_interruptible(&channel->sl_waitq,
>                                                    
> !list_empty(&channel->unused_slots));
>                 if (unlikely(ret) {
>                         BUG_ON(ret != -ERESTARTSYS);
>                         break;
>                 }
>
>                 spin_lock(&channel->channel_lock);
>                 if (unlikely(list_empty(&channel->unused_slots))) {
>                         spin_unlock(&channel->channel_lock);
>                         continue;
>                 }
>                 slot = list_entry(channel->unused_slots.next,
>                                   struct nfs4_slot,
>                                   slot_list);
>                 list_del(&slot->slot_list);
>                 spin_unlock(&channel->channel_lock);
>                 return slot;
>         }
>
>         return NULL;
> }
>
>
> Iyer, Rahul wrote:
>
> >Hi Benny,
> >A comment inline...
> >Regards
> >Rahul
> >
> >
> > 
> >
> >>-----Original Message-----
> >>From: Benny Halevy [mailto:bhalevy at panasas.com]
> >>Sent: Friday, March 16, 2007 2:24 AM
> >>To: Iyer, Rahul
> >>Cc: pnfs at linux-nfs.org
> >>Subject: Re: [pnfs] [PATCH 1/2] Slot Table Implementation
> >>
> >>Rahul, my comments below...
> >>
> >>Benny
> >>
> >>iyer at netapp.com wrote:
> >>   
> >>
> >>>From: iyer <iyer at netapp.com>
> >>>
> >>>Added code to implement a slot table.
> >>>
> >>>Signed-off-by: iyer <iyer at netapp.com>
> >>>---
> >>> fs/nfs/nfs41_sessions.h        |   62 --------
> >>> fs/nfs/nfs4_fs.h               |    2 +-
> >>> fs/nfs/nfs4proc.c              |  326
> >>>     
> >>>
> >>+++++++++++++++++++++++++++++++++++-----
> >>   
> >>
> >>> fs/nfs/nfs4xdr.c               |   24 ++--
> >>> fs/nfs/super.c                 |    6 +-
> >>> include/linux/nfs41_sessions.h |   72 +++++++++
> >>> include/linux/nfs_xdr.h        |    4 +
> >>> include/linux/nfsd/state.h     |    2 -
> >>> 8 files changed, 378 insertions(+), 120 deletions(-)  delete mode
> >>>100644 fs/nfs/nfs41_sessions.h  create mode 100644
> >>>include/linux/nfs41_sessions.h
> >>>
> >>>diff --git a/fs/nfs/nfs41_sessions.h
> >>>     
> >>>
> >>b/fs/nfs/nfs41_sessions.h deleted
> >>   
> >>
> >>>file mode 100644 index fc658c5..0000000
> >>>--- a/fs/nfs/nfs41_sessions.h
> >>>+++ /dev/null
> >>>@@ -1,62 +0,0 @@
> >>>-#ifndef __NFS4_1_SESSIONS_H__
> >>>-#define __NFS4_1_SESSIONS_H__
> >>>-
> >>>-typedef unsigned char               sessionid_t[16];
> >>>-typedef u32                 streamchannel_attrs;
> >>>-typedef u32                 rdmachannel_attrs;
> >>>-
> >>>-struct nfs4_channel_attrs {
> >>>-    unsigned long           max_rqst_sz;
> >>>-    unsigned long           max_resp_sz;
> >>>-    unsigned long           max_resp_sz_cached;
> >>>-    unsigned long           max_ops;
> >>>-    unsigned long           max_reqs;
> >>>-    streamchannel_attrs     stream_attrs;
> >>>-    rdmachannel_attrs       rdma_attrs;
> >>>-};
> >>>-
> >>>-struct nfs4_channel {
> >>>-    struct nfs4_channel_attrs       chan_attrs;
> >>>-    unsigned long                   nr_conns;
> >>>-    struct list_head                rpc_clients;
> >>>-};
> >>>-
> >>>-struct nfs4_session {
> >>>-    /* Session related params */
> >>>-    sessionid_t             sess_id;
> >>>-    u32                     seqid;  /* The seqid returned
> >>>     
> >>>
> >>by exchange_id */
> >>   
> >>
> >>>-    u32                     persist;
> >>>-    u32                     header_padding;
> >>>-    u32                     hash_alg;
> >>>-    u32                     ssv_len;
> >>>-    u32                     use_for_back_chan;
> >>>-    u32                     rdma_mode;
> >>>-
> >>>-    /* Slotid management */
> >>>-    unsigned long           nr_slots_in_use;
> >>>-    struct list_head        slots_in_use;
> >>>-    struct list_head        unused_slots;
> >>>-    struct rpc_wait_queue   slot_waitq;
> >>>-
> >>>-    /* The fore and back channel */
> >>>-    struct nfs4_channel     fore_channel;
> >>>-    struct nfs4_channel     back_channel;
> >>>-
> >>>-    unsigned int            expired;
> >>>-    struct nfs4_client *    client;
> >>>-    struct list_head        session_hashtbl;
> >>>-    spinlock_t              session_lock;
> >>>-    /* To prevent races between create_session and sequence */
> >>>-    int                     mutating;
> >>>-    struct semaphore        session_sem;
> >>>-    atomic_t        ref_count;
> >>>-};
> >>>-
> >>>-struct nfs4_slot {
> >>>-    u32                     slot_nr;
> >>>-    u32                     seq_nr;
> >>>-    struct nfs4_session *   session;
> >>>-    struct list_head        slot_list;
> >>>-};
> >>>-
> >>>-#endif
> >>>diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index
> >>>195d757..e50657e 100644
> >>>--- a/fs/nfs/nfs4_fs.h
> >>>+++ b/fs/nfs/nfs4_fs.h
> >>>@@ -9,7 +9,7 @@
> >>> #ifndef __LINUX_FS_NFS_NFS4_FS_H
> >>> #define __LINUX_FS_NFS_NFS4_FS_H
> >>>
> >>>-#include "nfs41_sessions.h"
> >>>+#include <linux/nfs41_sessions.h>
> >>>
> >>> #ifdef CONFIG_NFS_V4
> >>> #define NFSV4_MAX_MINORVERSION 1
> >>>diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index
> >>>167ff4a..28513a1 100644
> >>>--- a/fs/nfs/nfs4proc.c
> >>>+++ b/fs/nfs/nfs4proc.c
> >>>@@ -49,6 +49,7 @@
> >>> #include <linux/namei.h>
> >>> #include <linux/mount.h>
> >>> #include <linux/module.h>
> >>>+#include <linux/bitops.h>
> >>>
> >>> #include "nfs4_fs.h"
> >>> #include "delegation.h"
> >>>@@ -221,8 +222,10 @@ static int
> >>>     
> >>>
> >>nfs41_proc_sequence_done(struct nfs4_session *session, struct nfs41_s
> >>   
> >>
> >>>     unsigned long timestamp;
> >>>     struct nfs4_client *clp;
> >>>
> >>>-    if (!session || !(clp = session->client))
> >>>-            return 0;
> >>>+    if (!session || !(clp = session->client)) {
> >>>+            printk(KERN_EMERG "%s is NULL!!!\n",
> >>>     
> >>>
> >>(!session)?"session":"clp");
> >>   
> >>
> >>>+            BUG();
> >>>+    }
> >>>     
> >>>
> >>For performance reasons the defensive code should only be
> >>active with kernel debugging.  The conditional statement
> >>above is always executed needlessly and is not as clean as it
> >>could be because of the assignment embedded in the boolean
> >>expression.  Therefore, this better be coded like this:
> >>
> >>      BUG_ON(!session);
> >>      clp = session->client;
> >>      BUG_ON(!clp);
> >>
> >>   
> >>
> >>>
> >>>     if (!status) {
> >>>             timestamp = jiffies;
> >>>@@ -233,17 +236,131 @@ static int
> >>>     
> >>>
> >>nfs41_proc_sequence_done(struct nfs4_session *session, struct nfs41_s
> >>   
> >>
> >>>             spin_unlock(&clp->cl_lock);
> >>>     }
> >>>
> >>>+    /* Clear the 'busy' bit on the slot that was used */
> >>>+    smp_mb__before_clear_bit();
> >>>+    clear_bit(NFS4_SLOT_BUSY, &res->slot->flags);
> >>>+    smp_mb__after_clear_bit();
> >>>+
> >>>+    printk(KERN_EMERG "waking up waiters on slot %d\n",
> >>>+res->slot->slot_nr);
> >>>+
> >>>+    /* Wake up the threads waiting on this slot */
> >>>+    wake_up_bit(&res->slot->flags, NFS4_SLOT_BUSY);
> >>>+
> >>>     return status;
> >>> }
> >>>
> >>>+static int nfs4_wait_bit_interruptible(void *word) {
> >>>+    if (signal_pending(current))
> >>>+            return -ERESTARTSYS;
> >>>+    schedule();
> >>>+    return 0;
> >>>+}
> >>>+
> >>>+/* Find the lowest numbered slot or sleep on the least
> >>>     
> >>>
> >>loaded slot */
> >>   
> >>
> >>>+struct nfs4_slot *nfs4_find_slot(struct nfs4_channel *channel) {
> >>>+    struct nfs4_slot_table *tbl;
> >>>+    struct nfs4_slot *slot;
> >>>+    struct nfs4_slot *target_slot;
> >>>+    u32 max_slots;
> >>>+    u32 min_waiters;
> >>>+    int i;
> >>>+    int need_to_sleep;
> >>>+   
> >>>+    might_sleep();
> >>>+
> >>>+    tbl = &channel->slot_table;
> >>>+    min_waiters = tbl->slots[0].nr_waiters;
> >>>     
> >>>
> >>atomic_read()
> >>
> >>   
> >>
> >>>+    target_slot = &tbl->slots[0];
> >>>+
> >>>+    spin_lock(&tbl->slot_tbl_lock);
> >>>+    /* Make a local copy of max slots so we don't need to
> >>>     
> >>>
> >>hold it through
> >>   
> >>
> >>>+     * out.
> >>>+     * XXX Will need to revalidate this if slots are reclaimed
> >>>+     */
> >>>+    max_slots = tbl->max_slots;
> >>>+    spin_unlock(&tbl->slot_tbl_lock);
> >>>     
> >>>
> >>Using a spinlock here is unreasonably expensive when you
> >>could use atomic counters since the lock overhead is not
> >>amortized over multiple accesses to the structure.
> >>max_slots and nr_waiters should be defined as atomic_t and
> >>atomic operations be used to access them; Alternatively use
> >>one spinlock for the whole channel around the outer loop
> >>
> >>   
> >>
> >>>+
> >>>+    do {
> >>>+            need_to_sleep = 1;
> >>>+            for (i = 0; i <  max_slots; ++i) {
> >>>+                    slot = &tbl->slots[i];
> >>>+                    if (!test_and_set_bit(NFS4_SLOT_BUSY,
> >>>     
> >>>
> >>&slot->flags)){
> >>   
> >>
> >>>+                            /* We found an empty slot */
> >>>+                            target_slot = slot;
> >>>+                            need_to_sleep = 0;
> >>>+                            break;
> >>>+                    }
> >>>+
> >>>+                    spin_lock(&slot->slot_lock);
> >>>+                    if (min_waiters > slot->nr_waiters) {
> >>>+                            min_waiters = slot->nr_waiters;
> >>>+                            target_slot = slot;
> >>>+                    }
> >>>+                    spin_unlock(&slot->slot_lock);
> >>>     
> >>>
> >>why lock? use atomic_read(&slot->nr_waiters) also, if several
> >>threads go through this code in parallel they will pick the
> >>same target_slot so you want to increment.
> >>That's another reason to use a bigger lock.
> >>   
> >>
> >
> >Using atomic_t is a bit hard when you have to do compare and set kind of
> >operations...
> >Like for instance:
> >
> >If (min_waiters > slot->nr_waiters) {
> >       min_waiters = slot->nr_waiters;
> >       target_slot = slot;
> >}
> >
> >Doing the above atomically is hard with atomic variables only because
> >the whole sequence needs to be atomic. However, I can change the
> >spin_lock used to rwlock_t. This should better solve the problem. Right?
> >
> > 
> >
> >>>+                    printk(KERN_EMERG "slot %d has busy bit
> >>>     
> >>>
> >>%s and nr_waiters is %u\n"
> >>   
> >>
> >>>+                                    , slot->slot_nr,
> >>>+                                    (need_to_sleep)?"set":"unset",
> >>>+                                    slot->nr_waiters);
> >>>+            }
> >>>+   
> >>>+            /* Check whether we need to sleep. If so, sleep
> >>>     
> >>>
> >>on the BUSY bit
> >>   
> >>
> >>>+             * Increment the nr_waiters before sleeping and
> >>>     
> >>>
> >>decrement it
> >>   
> >>
> >>>+             * when woken up
> >>>+             */
> >>>+            if (need_to_sleep) {
> >>>+                    printk(KERN_EMERG "sleeping on slot %d;
> >>>     
> >>>
> >>seq_nr: %d\n",
> >>   
> >>
> >>>+                                    target_slot->slot_nr,
> >>>+                                    target_slot->seq_nr);
> >>>+                    spin_lock(&target_slot->slot_lock);
> >>>+                    ++target_slot->nr_waiters;
> >>>+                    spin_unlock(&target_slot->slot_lock);
> >>>     
> >>>
> >>atomic_inc()
> >>
> >>   
> >>
> >>>+   
> >>>+                    /* XXX: We need to check the return value of
> >>>+                     * eait_on_bit so that we can check if it's
> >>>+                     * interrupted.
> >>>+                     */
> >>>+                    wait_on_bit(&target_slot->flags,
> >>>     
> >>>
> >>NFS4_SLOT_BUSY,
> >>   
> >>
> >>>+                                    nfs4_wait_bit_interruptible,
> >>>+                                    TASK_INTERRUPTIBLE);
> >>>+
> >>>+                    spin_lock(&target_slot->slot_lock);
> >>>+                    --target_slot->nr_waiters;
> >>>+                    spin_unlock(&target_slot->slot_lock);
> >>>     
> >>>
> >>atomic_dec()
> >>
> >>But wait a second, why do you want a queue per slot to begin with?
> >>It seems like you can just go with a semaphore initialized to
> >>the number of slots, do down_interruptible before scanning
> >>for a free slot and up in nfs41_proc_sequence_done. Take a
> >>spin lock to scan the slot table (last index can be saved and
> >>rotated for uniformity) and if not free slot is found do a
> >>BUG() since the semaphore should prevent that from happening;
> >>
> >>
> >>   
> >>
> >>>+            }
> >>>+            else
> >>>+                    break;
> >>>+    }while (test_and_set_bit(NFS4_SLOT_BUSY, &target_slot->flags));
> >>>+
> >>>+    printk(KERN_EMERG "slot id: %u\nseqid: %u\n max_slots: %u\n",
> >>>+                    target_slot->slot_nr, target_slot->seq_nr,
> >>>+                            max_slots);
> >>>+   
> >>>+    return target_slot;
> >>>+}
> >>>+   
> >>> static int _nfs41_proc_setup_sequence(struct nfs4_session
> >>>     
> >>>
> >>*session,
> >>   
> >>
> >>>struct nfs41_sequence_args *args, struct nfs41_sequence_res *res)  {
> >>>     u32 *ptr;
> >>>-
> >>>+    struct nfs4_slot *slot;
> >>>+   
> >>>     ptr = (u32 *)session->sess_id;
> >>>     dprintk("%s: %u:%u:%u:%u\n", __FUNCTION__, ptr[0],
> >>>     
> >>>
> >>ptr[1], ptr[2],
> >>   
> >>
> >>>ptr[3]);
> >>>
> >>>-    memcpy(args->sessionid, (unsigned char
> >>>     
> >>>
> >>*)session->sess_id, NFS4_MAX_SESSIONID_LEN);
> >>   
> >>
> >>>+    memcpy(args->sessionid, (unsigned char *)session->sess_id,
> >>>+                                    NFS4_MAX_SESSIONID_LEN);
> >>>+
> >>>+       slot = nfs4_find_slot(&session->fore_channel);
> >>>+
> >>>+       /* XXX: Do we always increment this? Are there any
> >>>     
> >>>
> >>errors for which we
> >>   
> >>
> >>>+    * don't increment the sequence number?
> >>>+    */
> >>>+       args->seqid = slot->seq_nr++;
> >>>+       args->slotid = slot->slot_nr;
> >>>+       args->maxslots = session->fore_channel.slot_table.max_slots;
> >>>+     
> >>>+       res->slot = slot;
> >>>
> >>>     return 0;
> >>> }
> >>>@@ -1685,11 +1802,6 @@ static int nfs4_proc_get_root(struct
> >>>     
> >>>
> >>nfs_server *server, struct nfs_fh *fhandle,
> >>   
> >>
> >>>     };
> >>>     int status;
> >>>
> >>>-    if (server->rpc_ops->setup_sequence && (status =
> >>>-       
> >>>     
> >>>
> >>server->rpc_ops->setup_sequence(server->nfs4_state->cl_session,
> >>   
> >>
> >>>-        &seqargs, &seqres)))
> >>>-            return status;
> >>>-
> >>>     /*
> >>>      * Now we do a separate LOOKUP for each component of
> >>>     
> >>>
> >>the mount path.
> >>   
> >>
> >>>      * The LOOKUPs are done separately so that we can
> >>>     
> >>>
> >>conveniently @@
> >>   
> >>
> >>>-1714,9 +1826,21 @@ static int nfs4_proc_get_root(struct nfs_server
> >>>*server, struct nfs_fh *fhandle,
> >>>
> >>>             do {
> >>>                     nfs_fattr_init(fattr);
> >>>+                    if (server->rpc_ops->setup_sequence &&
> >>>     
> >>>
> >>(status =
> >>   
> >>
> >>>+                            server->rpc_ops->setup_sequence(
> >>>+                            server->nfs4_state->cl_session,
> >>>+                            &seqargs, &seqres)))
> >>>+                            return status;
> >>>+                   
> >>>                     status = nfs4_handle_exception(server,
> >>>                                    
> >>>     
> >>>
> >>rpc_call_sync(server->client, &msg, 0),
> >>   
> >>
> >>>                                     &exception);
> >>>+
> >>>+                    if (server->rpc_ops->sequence_done)
> >>>+                            server->rpc_ops->sequence_done(
> >>>+                                           
> >>>     
> >>>
> >>server->nfs4_state->cl_session,
> >>   
> >>
> >>>+                                            &seqres, status);
> >>>+
> >>>             } while (exception.retry);
> >>>             if (status == 0)
> >>>                     continue;
> >>>@@ -1731,10 +1855,6 @@ static int nfs4_proc_get_root(struct
> >>>     
> >>>
> >>nfs_server *server, struct nfs_fh *fhandle,
> >>   
> >>
> >>>     if (status == 0)
> >>>             status = nfs4_do_fsinfo(server, fhandle, info);
> >>> out:
> >>>-    if (server->rpc_ops->sequence_done)
> >>>-           
> >>>     
> >>>
> >>server->rpc_ops->sequence_done(server->nfs4_state->cl_session,
> >>   
> >>
> >>>-                                            &seqres, status);
> >>>-
> >>>     return nfs4_map_errors(status);
> >>> }
> >>>
> >>>@@ -3268,6 +3388,7 @@ int nfs4_proc_async_sequence(struct
> >>>     
> >>>
> >>nfs4_client *clp, struct rpc_cred *cred)
> >>   
> >>
> >>>     ret = rpc_call_async(clp->cl_rpcclient, &msg, RPC_TASK_SOFT,
> >>>     &nfs4_sequence_ops, (void *)jiffies);
> >>>
> >>>+        nfs41_proc_sequence_done(clp->cl_session, res, ret);
> >>> out:
> >>>     return ret;
> >>> out_free:
> >>>@@ -3581,14 +3702,6 @@ nfs4_async_handle_error(struct
> >>>     
> >>>
> >>rpc_task *task, const struct nfs_server *server)
> >>   
> >>
> >>>     return 0;
> >>> }
> >>>
> >>>-static int nfs4_wait_bit_interruptible(void *word) -{
> >>>-    if (signal_pending(current))
> >>>-            return -ERESTARTSYS;
> >>>-    schedule();
> >>>-    return 0;
> >>>-}
> >>>-
> >>> static int nfs4_wait_clnt_recover(struct rpc_clnt *clnt, struct
> >>>nfs4_client *clp)  {
> >>>     sigset_t oldset;
> >>>@@ -3823,6 +3936,111 @@ int nfs4_proc_get_lease_time(struct
> >>>     
> >>>
> >>nfs4_client *clp, struct nfs_fsinfo *fsinfo)
> >>   
> >>
> >>>     return status;
> >>> }
> >>>
> >>>+/* Initialize a slot table */
> >>>+int nfs4_init_slot_table(struct nfs4_channel *channel) {
> >>>+    int i;
> >>>+    struct nfs4_slot_table *tbl;
> >>>+    struct nfs4_slot *slot;
> >>>+   
> >>>+    tbl = &channel->slot_table;
> >>>+    tbl->max_slots = channel->chan_attrs.max_reqs;
> >>>+
> >>>+    tbl->slots = kzalloc(tbl->max_slots * sizeof(struct
> >>>     
> >>>
> >>nfs4_slot), GFP_ATOMIC);
> >>   
> >>
> >>>+    if (!tbl->slots)
> >>>+            return -ENOMEM;
> >>>+
> >>>+    spin_lock_init(&tbl->slot_tbl_lock);
> >>>+
> >>>+    for (i = 0; i < tbl->max_slots; ++i) {
> >>>+            slot = &tbl->slots[i];
> >>>+
> >>>+            slot->slot_nr = i;
> >>>+            slot->seq_nr = 1;
> >>>+            slot->flags = 0;
> >>>+            slot->nr_waiters = 0;
> >>>+            spin_lock_init(&slot->slot_lock);
> >>>+    }
> >>>+
> >>>+    return 0;
> >>>+}
> >>>+
> >>>+/* Destroy the slot table */
> >>>+void nfs4_destroy_slot_table(struct nfs4_channel *channel) {
> >>>+    int i;
> >>>+    struct nfs4_slot *slot;
> >>>+    struct nfs4_slot_table *tbl;
> >>>+
> >>>+    tbl = &channel->slot_table;
> >>>+   
> >>>+    for (i = 0; i < tbl->max_slots;++i) {
> >>>+            slot = &tbl->slots[i];
> >>>+
> >>>+            if (slot->nr_waiters)
> >>>+                    BUG();
> >>>+    }
> >>>+
> >>>+    kfree(channel->slot_table.slots);
> >>>+    channel->slot_table.slots = NULL;
> >>>+
> >>>+    return;
> >>>+}
> >>>+
> >>>+/* dump the channel attributes */
> >>>+void nfs4_dump_channel_attrs(struct nfs4_channel_attrs *attrs) {
> >>>+    printk(KERN_INFO "max_rqst_sz: %u\n", attrs->max_rqst_sz);
> >>>+    printk(KERN_INFO "max_resp_sz: %u\n", attrs->max_resp_sz);
> >>>+    printk(KERN_INFO "max_resp_sz_cached: %u\n",
> >>>     
> >>>
> >>attrs->max_resp_sz_cached);
> >>   
> >>
> >>>+    printk(KERN_INFO "max_ops: %u\n", attrs->max_ops);
> >>>+    printk(KERN_INFO "max_reqs: %u\n", attrs->max_reqs); }
> >>>+
> >>>+/* Initialize the values to be used by the client in
> >>>     
> >>>
> >>CREATE_SESSION
> >>   
> >>
> >>>+*/ void nfs4_init_channel_attrs(struct nfs4_client *clp,
> >>>+                            struct nfs4_channel_attrs *fc_attrs,
> >>>+                            struct nfs4_channel_attrs *bc_attrs) {
> >>>+    /* XXX: We need to have good values here... 32K is a
> >>>     
> >>>
> >>wild guess */
> >>   
> >>
> >>>+    fc_attrs->max_rqst_sz = bc_attrs->max_rqst_sz = 32768;
> >>>+    fc_attrs->max_resp_sz = bc_attrs->max_resp_sz = 32768;
> >>>+    fc_attrs->max_resp_sz_cached =
> >>>     
> >>>
> >>bc_attrs->max_resp_sz_cached = 32768;
> >>   
> >>
> >>>+    fc_attrs->max_ops = bc_attrs->max_ops = 0xFFFFFFFF;
> >>>+    fc_attrs->max_reqs = bc_attrs->max_reqs =
> >>>+                            clp->cl_rpcclient->cl_xprt->max_reqs;
> >>>+    fc_attrs->stream_attrs = bc_attrs->stream_attrs = 0;
> >>>+    fc_attrs->rdma_attrs = bc_attrs->rdma_attrs = 0;
> >>>+
> >>>+}
> >>>+
> >>>+/* Check the values returned by the server for
> >>>     
> >>>
> >>CREATE_SESSION. Since
> >>   
> >>
> >>>+we made
> >>>+ * our needs known, if the server gives us more than we need, we
> >>>+don't bother
> >>>+ * with it.
> >>>+ */
> >>>+void nfs4_adjust_channel_attrs(struct nfs4_channel_attrs
> >>>     
> >>>
> >>*req_attrs,
> >>   
> >>
> >>>+                            struct nfs4_channel_attrs *resp_attrs) {
> >>>+    if (req_attrs->max_rqst_sz < resp_attrs->max_rqst_sz)
> >>>+            resp_attrs->max_rqst_sz = req_attrs->max_rqst_sz;
> >>>+
> >>>+    if (req_attrs->max_resp_sz < resp_attrs->max_resp_sz)
> >>>+            resp_attrs->max_resp_sz = req_attrs->max_resp_sz;
> >>>+
> >>>+    if (req_attrs->max_resp_sz_cached < resp_attrs->max_resp_sz)
> >>>+            resp_attrs->max_resp_sz = req_attrs->max_resp_sz_cached;
> >>>+
> >>>+    if (req_attrs->max_ops < resp_attrs->max_ops)
> >>>+            resp_attrs->max_ops = req_attrs->max_ops;
> >>>+   
> >>>+    if (req_attrs->max_reqs < resp_attrs->max_reqs)
> >>>+            resp_attrs->max_reqs = req_attrs->max_reqs;
> >>>+
> >>>+    /* XXX: We ignore the stream channel attributes... we
> >>>     
> >>>
> >>have no idea what
> >>   
> >>
> >>>+     * to do with them anyways!
> >>>+     */
> >>>+}
> >>>+
> >>> int _nfs4_proc_create_session(struct nfs4_client *clp, struct
> >>>nfs4_session *session, struct rpc_clnt *clnt)  {
> >>>     struct nfs41_create_session_args args = { @@ -3845,7
> >>>     
> >>>
> >>+4063,19 @@ int
> >>   
> >>
> >>>_nfs4_proc_create_session(struct nfs4_client *clp, struct
> >>>     
> >>>
> >>nfs4_session *sess
> >>   
> >>
> >>>     };
> >>>     int status;
> >>>
> >>>-    status = rpc_call_sync(clnt, &msg, 0);
> >>>+    nfs4_init_channel_attrs(clp, &args.fc_attrs, &args.bc_attrs);
> >>>+   
> >>>+        status = rpc_call_sync(clnt, &msg, 0);
> >>>+
> >>>+    /* Set the negotiated values in the session's
> >>>     
> >>>
> >>channel_attrs struct
> >>   
> >>
> >>>+*/
> >>>+
> >>>+    if (!status) {
> >>>+            nfs4_adjust_channel_attrs(&args.fc_attrs,
> >>>+                                   
> >>>     
> >>>
> >>&session->fore_channel.chan_attrs);
> >>   
> >>
> >>>+            nfs4_adjust_channel_attrs(&args.bc_attrs,
> >>>+                                   
> >>>     
> >>>
> >>&session->back_channel.chan_attrs);
> >>   
> >>
> >>>+    }
> >>>+   
> >>>     return status;
> >>> }
> >>> EXPORT_SYMBOL(_nfs4_proc_create_session);
> >>>@@ -3853,29 +4083,28 @@ EXPORT_SYMBOL(_nfs4_proc_create_session);
> >>> int nfs4_proc_create_session(struct nfs4_client *clp)  {
> >>>     int status;
> >>>-    unsigned long now;
> >>>     struct nfs4_session *session;
> >>>-    struct nfs_fsinfo fsinfo;
> >>>     u32 *ptr;
> >>>
> >>>-    now = jiffies;
> >>>-
> >>>     status = _nfs4_proc_create_session(clp,
> >>>     
> >>>
> >>clp->cl_session, clp->cl_rpcclient);
> >>   
> >>
> >>>     if (status)
> >>>             return status;
> >>>
> >>>-    status = nfs4_proc_get_lease_time(clp, &fsinfo);
> >>>-    if (status == 0) {
> >>>-            /* Update lease time and schedule renewal */
> >>>-            spin_lock(&clp->cl_lock);
> >>>-            clp->cl_lease_time = fsinfo.lease_time * HZ;
> >>>-            clp->cl_last_renewal = now;
> >>>-            clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
> >>>-            spin_unlock(&clp->cl_lock);
> >>>+    session = clp->cl_session;
> >>>
> >>>-            nfs4_schedule_state_renewal(clp);
> >>>+    /* Init the fore channel */
> >>>+    status = nfs4_init_slot_table(&session->fore_channel);
> >>>+    dprintk("fc init returned %d\n", status);
> >>>+    if (status)
> >>>+            return status;
> >>>+
> >>>+    /* Init the back channel */
> >>>+    status = nfs4_init_slot_table(&session->back_channel);
> >>>+    dprintk("bc init returned %d\n", status);
> >>>+    if (status) {
> >>>+            nfs4_destroy_slot_table(&session->fore_channel);
> >>>+            return status;
> >>>     }
> >>>-    session = clp->cl_session;
> >>>
> >>>     ptr = (int *)session->sess_id;
> >>>     dprintk("sessionid is: %d:%d:%d:%d\n", ptr[0], ptr[1], ptr[2],
> >>>ptr[3]); @@ -4769,11 +4998,8 @@ struct nfs4_session
> >>>*nfs41_alloc_session(void)
> >>>
> >>>     session->expired = 1;
> >>>
> >>>-    INIT_LIST_HEAD(&session->slots_in_use);
> >>>-    INIT_LIST_HEAD(&session->unused_slots);
> >>>     INIT_LIST_HEAD(&session->session_hashtbl);
> >>>
> >>>-    //rpc_init_wait_queue(&session->slot_waitq, "Slot waitqueue");
> >>>     spin_lock_init(&session->session_lock);
> >>>
> >>>     sema_init(&session->session_sem, 1); @@ -4797,6 +5023,8 @@ void
> >>>nfs4_get_session(struct nfs4_session *session)  void
> >>>nfs4_put_session(struct nfs4_session **session)  {
> >>>     if (atomic_dec_and_test(&((*session)->ref_count))) {
> >>>+            nfs4_destroy_slot_table(&((*session)->fore_channel));
> >>>+            nfs4_destroy_slot_table(&((*session)->back_channel));
> >>>             nfs41_free_session(*session);
> >>>             *session = NULL;
> >>>     }
> >>>@@ -4806,7 +5034,11 @@ int nfs41_proc_setup_session(struct
> >>>     
> >>>
> >>nfs4_client
> >>   
> >>
> >>>*clp)  {
> >>>     int status;
> >>>
> >>>-    dprintk("in %s!\n", __FUNCTION__);
> >>>+        struct nfs_fsinfo fsinfo;
> >>>+    unsigned long now;
> >>>+   
> >>>+    now = jiffies;
> >>>+
> >>>     if (!clp->cl_session) {
> >>>             /* create the session struct to hold the
> >>>     
> >>>
> >>session parameters */
> >>   
> >>
> >>>             clp->cl_session = nfs41_alloc_session(); @@
> >>>     
> >>>
> >>-4844,6 +5076,20 @@ int
> >>   
> >>
> >>>nfs41_proc_setup_session(struct nfs4_client *clp)
> >>>     clp->cl_session->expired = 0;
> >>>     clp->cl_session->client = clp;
> >>>
> >>>+    status = nfs4_proc_get_lease_time(clp, &fsinfo);
> >>>+
> >>>+    if (status)
> >>>+            goto out_free;
> >>>+   
> >>>+    /* Update lease time and schedule renewal */
> >>>+    spin_lock(&clp->cl_lock);
> >>>+    clp->cl_lease_time = fsinfo.lease_time * HZ;
> >>>+    clp->cl_last_renewal = now;
> >>>+    clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
> >>>+    spin_unlock(&clp->cl_lock);
> >>>+
> >>>+    nfs4_schedule_state_renewal(clp);
> >>>+   
> >>>     clp->cl_session->mutating = 0;
> >>> out:
> >>>     up(&clp->cl_session->session_sem);
> >>>diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index
> >>>5ef6d70..7da9a6a 100644
> >>>--- a/fs/nfs/nfs4xdr.c
> >>>+++ b/fs/nfs/nfs4xdr.c
> >>>@@ -1429,20 +1429,22 @@ static int
> >>>     
> >>>
> >>encode_create_session(struct xdr_stream *xdr, struct nfs41_create_ses
> >>   
> >>
> >>>     RESERVE_SPACE(2*28);                    /* 2 channel_attrs */
> >>>
> >>>     /* Fore Channel */
> >>>-    WRITE32(32768);                         /* max req size */
> >>>-    WRITE32(32768);                         /* max resp size */
> >>>-    WRITE32(32768);                         /* Max resp
> >>>     
> >>>
> >>size cached */
> >>   
> >>
> >>>-    WRITE32(RPC_DEF_SLOT_TABLE);            /* max operations */
> >>>-    WRITE32(RPC_DEF_SLOT_TABLE);            /* max requests */
> >>>+    WRITE32(args->fc_attrs.max_rqst_sz);    /* max req size */
> >>>+    WRITE32(args->fc_attrs.max_resp_sz);    /* max resp size */
> >>>+    WRITE32(args->fc_attrs.max_resp_sz_cached);     /* Max
> >>>     
> >>>
> >>resp sz cached */
> >>   
> >>
> >>>+    WRITE32(args->fc_attrs.max_ops);        /* max operations */
> >>>+    WRITE32(args->fc_attrs.max_reqs);       /* max requests */
> >>>+
> >>>     WRITE32(0);                             /*
> >>>     
> >>>
> >>Streamchannel attrs */
> >>   
> >>
> >>>     WRITE32(0);                             /*rdmachannel_attrs */
> >>>
> >>>     /* Back Channel */
> >>>-    WRITE32(32768);                         /* max req size */
> >>>-    WRITE32(32768);                         /* max resp size */
> >>>-    WRITE32(32768);                         /* Max resp
> >>>     
> >>>
> >>size cached */
> >>   
> >>
> >>>-    WRITE32(RPC_DEF_SLOT_TABLE);            /* max operations */
> >>>-    WRITE32(RPC_DEF_SLOT_TABLE);            /* max requests */
> >>>+    WRITE32(args->bc_attrs.max_rqst_sz);    /* max req size */
> >>>+    WRITE32(args->bc_attrs.max_resp_sz);    /* max resp size */
> >>>+    WRITE32(args->bc_attrs.max_resp_sz_cached);     /* Max
> >>>     
> >>>
> >>resp sz cached */
> >>   
> >>
> >>>+    WRITE32(args->bc_attrs.max_ops);        /* max operations */
> >>>+    WRITE32(args->bc_attrs.max_reqs);       /* max requests */
> >>>+
> >>>     WRITE32(0);                             /*
> >>>     
> >>>
> >>Streamchannel attrs */
> >>   
> >>
> >>>     WRITE32(0);                             /*rdmachannel_attrs */
> >>>
> >>>@@ -6404,7 +6406,7 @@ static int
> >>>     
> >>>
> >>nfs41_xdr_dec_get_lease_time(struct rpc_rqst *rqstp, uint32_t *p, str
> >>   
> >>
> >>>             status = decode_putrootfh(&xdr);
> >>>     if (!status)
> >>>             status = decode_fsinfo(&xdr, res->fsinfo);
> >>>-    if (!status)
> >>>+    if (status)
> >>>             status = -nfs_stat_to_errno(hdr.status);
> >>>     return status;
> >>> }
> >>>diff --git a/fs/nfs/super.c b/fs/nfs/super.c index a3bc630..89eb564
> >>>100644
> >>>--- a/fs/nfs/super.c
> >>>+++ b/fs/nfs/super.c
> >>>@@ -1260,15 +1260,13 @@ static int nfs4_fill_super(struct
> >>>     
> >>>
> >>super_block *sb, struct nfs4_mount_data *data,
> >>   
> >>
> >>>                 server->nfs4_state->cl_minorversion = i;
> >>>
> >>>                 if (server->rpc_ops->setup_session) {
> >>>-                        int status;
> >>>-
> >>>                         lock_kernel();
> >>>                         down_write(&server->nfs4_state->cl_sem);
> >>>-                        status = 
> >>>     
> >>>
> >>server->rpc_ops->setup_session(server->nfs4_state);
> >>   
> >>
> >>>+                        err = 
> >>>+ server->rpc_ops->setup_session(server->nfs4_state);
> >>>                         up_write(&server->nfs4_state->cl_sem);
> >>>                         unlock_kernel();
> >>>
> >>>-                        if (status) {
> >>>+                        if (err) {
> >>>                                 printk(KERN_EMERG
> >>>     
> >>>
> >>"Couldn't mount using minorversion %d\n", i);
> >>   
> >>
> >>>                                
> >>>     
> >>>
> >>rpc_shutdown_client(server->client);
> >>   
> >>
> >>>                         }
> >>>diff --git a/include/linux/nfs41_sessions.h
> >>>b/include/linux/nfs41_sessions.h new file mode 100644 index
> >>>0000000..02f45db
> >>>--- /dev/null
> >>>+++ b/include/linux/nfs41_sessions.h
> >>>@@ -0,0 +1,72 @@
> >>>+#ifndef __NFS4_1_SESSIONS_H__
> >>>+#define __NFS4_1_SESSIONS_H__
> >>>+
> >>>+/* The flags for the nfs4_slot struct */
> >>>+#define NFS4_SLOT_BUSY              0X0     /* Slot in use */
> >>>+#define NFS4_SLOT_RECLAIMED 0x1     /* Slot has
> >>>     
> >>>
> >>been reclaimed by
> >>   
> >>
> >>>+                                       the server */
> >>>+
> >>>+typedef unsigned char               sessionid_t[16];
> >>>+typedef u32                 streamchannel_attrs;
> >>>+typedef u32                 rdmachannel_attrs;
> >>>+
> >>>+struct nfs4_channel_attrs {
> >>>+    u32                     max_rqst_sz;
> >>>+    u32                     max_resp_sz;
> >>>+    u32                     max_resp_sz_cached;
> >>>+    u32                     max_ops;
> >>>+    u32                     max_reqs;
> >>>+    streamchannel_attrs     stream_attrs;
> >>>+    rdmachannel_attrs       rdma_attrs;
> >>>+};
> >>>+
> >>>+struct nfs4_slot {
> >>>+    u32                     slot_nr;
> >>>+    u32                     seq_nr;
> >>>+    unsigned long           flags;
> >>>+    u32                     nr_waiters;
> >>>+    spinlock_t              slot_lock;
> >>>+};
> >>>+
> >>>+struct nfs4_slot_table {
> >>>+    struct nfs4_slot        *slots;
> >>>+    u32                     max_slots;
> >>>+    spinlock_t              slot_tbl_lock;
> >>>+};
> >>>+
> >>>+struct nfs4_channel {
> >>>+    struct nfs4_channel_attrs       chan_attrs;
> >>>+    struct rpc_clnt                 *rpc_client;
> >>>+    struct nfs4_slot_table          slot_table;
> >>>+};
> >>>+
> >>>+struct nfs4_session {
> >>>+    /* Session related params */
> >>>+    sessionid_t                     sess_id;
> >>>+    u32                             seqid;  /* The seqid
> >>>     
> >>>
> >>returned by
> >>   
> >>
> >>>+                                               exchange_id */
> >>>+    u32                             persist;
> >>>+    u32                             header_padding;
> >>>+    u32                             hash_alg;
> >>>+    u32                             ssv_len;
> >>>+    u32                             use_for_back_chan;
> >>>+    u32                             rdma_mode;
> >>>+
> >>>+    /* The fore and back channel */
> >>>+    struct nfs4_channel             fore_channel;
> >>>+    struct nfs4_channel             back_channel;
> >>>+
> >>>+    unsigned int                    expired;
> >>>+    struct nfs4_client *            client;
> >>>+    struct list_head                session_hashtbl;
> >>>+    spinlock_t                      session_lock;
> >>>+    /* To prevent races between create_session and sequence */
> >>>+    int                             mutating;
> >>>+    struct semaphore                session_sem;
> >>>+    atomic_t                        ref_count;
> >>>+};
> >>>+
> >>>+
> >>>+#endif
> >>>+
> >>>+
> >>>diff --git a/include/linux/nfs_xdr.h
> >>>     
> >>>
> >>b/include/linux/nfs_xdr.h index
> >>   
> >>
> >>>4648707..52b9e38 100644
> >>>--- a/include/linux/nfs_xdr.h
> >>>+++ b/include/linux/nfs_xdr.h
> >>>@@ -3,6 +3,7 @@
> >>>
> >>> #include <linux/sunrpc/xprt.h>
> >>> #include <linux/nfsacl.h>
> >>>+#include <linux/nfs41_sessions.h>
> >>>
> >>> /*
> >>>  * To change the maximum rsize and wsize supported by the
> >>>     
> >>>
> >>NFS client,
> >>   
> >>
> >>>adjust @@ -778,6 +779,8 @@ struct nfs41_create_session_args {
> >>>     uint32_t                        use_for_backchannel;
> >>>     uint32_t                        use_for_rdma;
> >>>     uint32_t                        cb_program;
> >>>+    struct nfs4_channel_attrs       fc_attrs;       /* Fore
> >>>     
> >>>
> >>Channel */
> >>   
> >>
> >>>+    struct nfs4_channel_attrs       bc_attrs;       /* Back
> >>>     
> >>>
> >>Channel */
> >>   
> >>
> >>> };
> >>>
> >>> struct nfs41_create_session_res {
> >>>@@ -805,6 +808,7 @@ struct nfs41_sequence_res {
> >>>     u32                             target_maxslots;
> >>>     u32                             status_flags;
> >>>     struct nfs4_state_owner         *sp;
> >>>+    struct nfs4_slot                *slot;
> >>> };
> >>>
> >>> struct nfs4_get_lease_time_args {
> >>>diff --git a/include/linux/nfsd/state.h
> >>>     
> >>>
> >>b/include/linux/nfsd/state.h
> >>   
> >>
> >>>index 8f699b4..fde667e 100644
> >>>--- a/include/linux/nfsd/state.h
> >>>+++ b/include/linux/nfsd/state.h
> >>>@@ -47,8 +47,6 @@ typedef struct {
> >>>     u32             cl_id;
> >>> } clientid_t;
> >>>
> >>>-typedef unsigned char sessionid_t[16];
> >>>-
> >>> typedef struct {
> >>>     u32             so_boot;
> >>>     u32             so_stateownerid;
> >>>     
> >>>
>



More information about the pNFS mailing list