[pnfs] [PATCH 1/2] Slot Table Implementation
Iyer, Rahul
Rahul.Iyer at netapp.com
Sun Mar 18 03:24:32 EDT 2007
Hi Benny,
My first idea was similar to this. That was the reason I had the unused slots and used slots lists. The reason I abandoned that was because it would be difficult to get the lowest numbered slot. The draft suggests that the client should use the lowest numbered slot so that the server can reclaim slots from the client if it sees that they are under-utilized (at least that's what I understand).It is not a hard requirement, but one that is recommended.
>From draft 10, Section 2.4.10.1, Paragraph 12
"The requester is required to use the lowest available slot when issuing a new request. This way, the replier may be able to retire slot entries faster."
Further, this email exchange adds some more insight:
http://www1.ietf.org/mail-archive/web/nfsv4/current/msg03997.html
The problem I foresaw with the lists is that keeping the lists ordered is almost as much work as having separate lists. As for the atomic_t variables, I'm not too hot about the atomic variables. test_and_set is a bus locking instruction and so isn't such a great idea... the spin_locks are "optimized for the fast path", which is getting the spin_locks, but they're too much of a big hammer maybe. rw_locks might be a better idea, but i'm not 100% convinced... I'm pretty sure they use some sort of bus locking instruction too. As for the current implementation, scaling shouldn't be too much of a problem as the number of slots is likely to be small (few tens) and the worst case is bounded. I agree it is linear time, but at least the worst case doesn't run away from us. Since the RPC slots is a tunable parameter, it would be interesting to see what an "optimal" rpc slot count is... as in at what point does adding more slots make no difference to performance. Is there such a study?
Another idea I can think of is to have a single lock that protects the whole slot table. We grab the lock and run down the slot table to select our slot (and set need_to_sleep if required), and then release the lock. Then, if need_to_sleep is set, we grab the lock, increment nr_waiters for the slot, release the lock and sleep.
Should this be a spin_lock or an rw_lock, I'm not sure... spin_locks will harm the concurrency because it would lock out parallel reads. On the other hand, rw_locks are biased towards readers or writers (not sure which for Linux), so that may not be a good thing.
That being said, I'm not wedded to the current implementation in any way... comments and better algorithms are welcome. This was just one way that I thought made sense.
Thanks,
Regards
Rahul
-----Original Message-----
From: Benny Halevy [mailto:bhalevy at panasas.com]
Sent: Sat 3/17/2007 3:54 AM
To: Iyer, Rahul
Cc: pnfs at linux-nfs.org
Subject: Re: [pnfs] [PATCH 1/2] Slot Table Implementation
Rahul, this is not urgent but I really think we should move to a single
queue for several reason. With your implementation when all slots
are used you have to do max_slots test_and_set operations to
discover that. This will just not scale. I agree with your point
about semaphores and renegotiating the number of slots so how
about this (sketch) instead:
nfs41_proc_sequence_done()
{
int need_wakeup = 0;
spin_lock(&res->channel->channel_lock);
list_add(&res->slot->slot_list, &res->channel->unused_slots);
spin_unlock(&res->channel->channel_lock);
/* Wake up the threads waiting on this session */
rpc_wake_up(&res->channel->sl_waitq);
return status;
}
struct nfs4_slot *nfs4_find_slot(struct nfs4_channel *channel)
{
struct nfs4_slot *slot;
might_sleep();
for (;;) {
int ret = wait_event_interruptible(&channel->sl_waitq,
!list_empty(&channel->unused_slots));
if (unlikely(ret) {
BUG_ON(ret != -ERESTARTSYS);
break;
}
spin_lock(&channel->channel_lock);
if (unlikely(list_empty(&channel->unused_slots))) {
spin_unlock(&channel->channel_lock);
continue;
}
slot = list_entry(channel->unused_slots.next,
struct nfs4_slot,
slot_list);
list_del(&slot->slot_list);
spin_unlock(&channel->channel_lock);
return slot;
}
return NULL;
}
Iyer, Rahul wrote:
>Hi Benny,
>A comment inline...
>Regards
>Rahul
>
>
>
>
>>-----Original Message-----
>>From: Benny Halevy [mailto:bhalevy at panasas.com]
>>Sent: Friday, March 16, 2007 2:24 AM
>>To: Iyer, Rahul
>>Cc: pnfs at linux-nfs.org
>>Subject: Re: [pnfs] [PATCH 1/2] Slot Table Implementation
>>
>>Rahul, my comments below...
>>
>>Benny
>>
>>iyer at netapp.com wrote:
>>
>>
>>>From: iyer <iyer at netapp.com>
>>>
>>>Added code to implement a slot table.
>>>
>>>Signed-off-by: iyer <iyer at netapp.com>
>>>---
>>> fs/nfs/nfs41_sessions.h | 62 --------
>>> fs/nfs/nfs4_fs.h | 2 +-
>>> fs/nfs/nfs4proc.c | 326
>>>
>>>
>>+++++++++++++++++++++++++++++++++++-----
>>
>>
>>> fs/nfs/nfs4xdr.c | 24 ++--
>>> fs/nfs/super.c | 6 +-
>>> include/linux/nfs41_sessions.h | 72 +++++++++
>>> include/linux/nfs_xdr.h | 4 +
>>> include/linux/nfsd/state.h | 2 -
>>> 8 files changed, 378 insertions(+), 120 deletions(-) delete mode
>>>100644 fs/nfs/nfs41_sessions.h create mode 100644
>>>include/linux/nfs41_sessions.h
>>>
>>>diff --git a/fs/nfs/nfs41_sessions.h
>>>
>>>
>>b/fs/nfs/nfs41_sessions.h deleted
>>
>>
>>>file mode 100644 index fc658c5..0000000
>>>--- a/fs/nfs/nfs41_sessions.h
>>>+++ /dev/null
>>>@@ -1,62 +0,0 @@
>>>-#ifndef __NFS4_1_SESSIONS_H__
>>>-#define __NFS4_1_SESSIONS_H__
>>>-
>>>-typedef unsigned char sessionid_t[16];
>>>-typedef u32 streamchannel_attrs;
>>>-typedef u32 rdmachannel_attrs;
>>>-
>>>-struct nfs4_channel_attrs {
>>>- unsigned long max_rqst_sz;
>>>- unsigned long max_resp_sz;
>>>- unsigned long max_resp_sz_cached;
>>>- unsigned long max_ops;
>>>- unsigned long max_reqs;
>>>- streamchannel_attrs stream_attrs;
>>>- rdmachannel_attrs rdma_attrs;
>>>-};
>>>-
>>>-struct nfs4_channel {
>>>- struct nfs4_channel_attrs chan_attrs;
>>>- unsigned long nr_conns;
>>>- struct list_head rpc_clients;
>>>-};
>>>-
>>>-struct nfs4_session {
>>>- /* Session related params */
>>>- sessionid_t sess_id;
>>>- u32 seqid; /* The seqid returned
>>>
>>>
>>by exchange_id */
>>
>>
>>>- u32 persist;
>>>- u32 header_padding;
>>>- u32 hash_alg;
>>>- u32 ssv_len;
>>>- u32 use_for_back_chan;
>>>- u32 rdma_mode;
>>>-
>>>- /* Slotid management */
>>>- unsigned long nr_slots_in_use;
>>>- struct list_head slots_in_use;
>>>- struct list_head unused_slots;
>>>- struct rpc_wait_queue slot_waitq;
>>>-
>>>- /* The fore and back channel */
>>>- struct nfs4_channel fore_channel;
>>>- struct nfs4_channel back_channel;
>>>-
>>>- unsigned int expired;
>>>- struct nfs4_client * client;
>>>- struct list_head session_hashtbl;
>>>- spinlock_t session_lock;
>>>- /* To prevent races between create_session and sequence */
>>>- int mutating;
>>>- struct semaphore session_sem;
>>>- atomic_t ref_count;
>>>-};
>>>-
>>>-struct nfs4_slot {
>>>- u32 slot_nr;
>>>- u32 seq_nr;
>>>- struct nfs4_session * session;
>>>- struct list_head slot_list;
>>>-};
>>>-
>>>-#endif
>>>diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index
>>>195d757..e50657e 100644
>>>--- a/fs/nfs/nfs4_fs.h
>>>+++ b/fs/nfs/nfs4_fs.h
>>>@@ -9,7 +9,7 @@
>>> #ifndef __LINUX_FS_NFS_NFS4_FS_H
>>> #define __LINUX_FS_NFS_NFS4_FS_H
>>>
>>>-#include "nfs41_sessions.h"
>>>+#include <linux/nfs41_sessions.h>
>>>
>>> #ifdef CONFIG_NFS_V4
>>> #define NFSV4_MAX_MINORVERSION 1
>>>diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index
>>>167ff4a..28513a1 100644
>>>--- a/fs/nfs/nfs4proc.c
>>>+++ b/fs/nfs/nfs4proc.c
>>>@@ -49,6 +49,7 @@
>>> #include <linux/namei.h>
>>> #include <linux/mount.h>
>>> #include <linux/module.h>
>>>+#include <linux/bitops.h>
>>>
>>> #include "nfs4_fs.h"
>>> #include "delegation.h"
>>>@@ -221,8 +222,10 @@ static int
>>>
>>>
>>nfs41_proc_sequence_done(struct nfs4_session *session, struct nfs41_s
>>
>>
>>> unsigned long timestamp;
>>> struct nfs4_client *clp;
>>>
>>>- if (!session || !(clp = session->client))
>>>- return 0;
>>>+ if (!session || !(clp = session->client)) {
>>>+ printk(KERN_EMERG "%s is NULL!!!\n",
>>>
>>>
>>(!session)?"session":"clp");
>>
>>
>>>+ BUG();
>>>+ }
>>>
>>>
>>For performance reasons the defensive code should only be
>>active with kernel debugging. The conditional statement
>>above is always executed needlessly and is not as clean as it
>>could be because of the assignment embedded in the boolean
>>expression. Therefore, this better be coded like this:
>>
>> BUG_ON(!session);
>> clp = session->client;
>> BUG_ON(!clp);
>>
>>
>>
>>>
>>> if (!status) {
>>> timestamp = jiffies;
>>>@@ -233,17 +236,131 @@ static int
>>>
>>>
>>nfs41_proc_sequence_done(struct nfs4_session *session, struct nfs41_s
>>
>>
>>> spin_unlock(&clp->cl_lock);
>>> }
>>>
>>>+ /* Clear the 'busy' bit on the slot that was used */
>>>+ smp_mb__before_clear_bit();
>>>+ clear_bit(NFS4_SLOT_BUSY, &res->slot->flags);
>>>+ smp_mb__after_clear_bit();
>>>+
>>>+ printk(KERN_EMERG "waking up waiters on slot %d\n",
>>>+res->slot->slot_nr);
>>>+
>>>+ /* Wake up the threads waiting on this slot */
>>>+ wake_up_bit(&res->slot->flags, NFS4_SLOT_BUSY);
>>>+
>>> return status;
>>> }
>>>
>>>+static int nfs4_wait_bit_interruptible(void *word) {
>>>+ if (signal_pending(current))
>>>+ return -ERESTARTSYS;
>>>+ schedule();
>>>+ return 0;
>>>+}
>>>+
>>>+/* Find the lowest numbered slot or sleep on the least
>>>
>>>
>>loaded slot */
>>
>>
>>>+struct nfs4_slot *nfs4_find_slot(struct nfs4_channel *channel) {
>>>+ struct nfs4_slot_table *tbl;
>>>+ struct nfs4_slot *slot;
>>>+ struct nfs4_slot *target_slot;
>>>+ u32 max_slots;
>>>+ u32 min_waiters;
>>>+ int i;
>>>+ int need_to_sleep;
>>>+
>>>+ might_sleep();
>>>+
>>>+ tbl = &channel->slot_table;
>>>+ min_waiters = tbl->slots[0].nr_waiters;
>>>
>>>
>>atomic_read()
>>
>>
>>
>>>+ target_slot = &tbl->slots[0];
>>>+
>>>+ spin_lock(&tbl->slot_tbl_lock);
>>>+ /* Make a local copy of max slots so we don't need to
>>>
>>>
>>hold it through
>>
>>
>>>+ * out.
>>>+ * XXX Will need to revalidate this if slots are reclaimed
>>>+ */
>>>+ max_slots = tbl->max_slots;
>>>+ spin_unlock(&tbl->slot_tbl_lock);
>>>
>>>
>>Using a spinlock here is unreasonably expensive when you
>>could use atomic counters since the lock overhead is not
>>amortized over multiple accesses to the structure.
>>max_slots and nr_waiters should be defined as atomic_t and
>>atomic operations be used to access them; Alternatively use
>>one spinlock for the whole channel around the outer loop
>>
>>
>>
>>>+
>>>+ do {
>>>+ need_to_sleep = 1;
>>>+ for (i = 0; i < max_slots; ++i) {
>>>+ slot = &tbl->slots[i];
>>>+ if (!test_and_set_bit(NFS4_SLOT_BUSY,
>>>
>>>
>>&slot->flags)){
>>
>>
>>>+ /* We found an empty slot */
>>>+ target_slot = slot;
>>>+ need_to_sleep = 0;
>>>+ break;
>>>+ }
>>>+
>>>+ spin_lock(&slot->slot_lock);
>>>+ if (min_waiters > slot->nr_waiters) {
>>>+ min_waiters = slot->nr_waiters;
>>>+ target_slot = slot;
>>>+ }
>>>+ spin_unlock(&slot->slot_lock);
>>>
>>>
>>why lock? use atomic_read(&slot->nr_waiters) also, if several
>>threads go through this code in parallel they will pick the
>>same target_slot so you want to increment.
>>That's another reason to use a bigger lock.
>>
>>
>
>Using atomic_t is a bit hard when you have to do compare and set kind of
>operations...
>Like for instance:
>
>If (min_waiters > slot->nr_waiters) {
> min_waiters = slot->nr_waiters;
> target_slot = slot;
>}
>
>Doing the above atomically is hard with atomic variables only because
>the whole sequence needs to be atomic. However, I can change the
>spin_lock used to rwlock_t. This should better solve the problem. Right?
>
>
>
>>>+ printk(KERN_EMERG "slot %d has busy bit
>>>
>>>
>>%s and nr_waiters is %u\n"
>>
>>
>>>+ , slot->slot_nr,
>>>+ (need_to_sleep)?"set":"unset",
>>>+ slot->nr_waiters);
>>>+ }
>>>+
>>>+ /* Check whether we need to sleep. If so, sleep
>>>
>>>
>>on the BUSY bit
>>
>>
>>>+ * Increment the nr_waiters before sleeping and
>>>
>>>
>>decrement it
>>
>>
>>>+ * when woken up
>>>+ */
>>>+ if (need_to_sleep) {
>>>+ printk(KERN_EMERG "sleeping on slot %d;
>>>
>>>
>>seq_nr: %d\n",
>>
>>
>>>+ target_slot->slot_nr,
>>>+ target_slot->seq_nr);
>>>+ spin_lock(&target_slot->slot_lock);
>>>+ ++target_slot->nr_waiters;
>>>+ spin_unlock(&target_slot->slot_lock);
>>>
>>>
>>atomic_inc()
>>
>>
>>
>>>+
>>>+ /* XXX: We need to check the return value of
>>>+ * eait_on_bit so that we can check if it's
>>>+ * interrupted.
>>>+ */
>>>+ wait_on_bit(&target_slot->flags,
>>>
>>>
>>NFS4_SLOT_BUSY,
>>
>>
>>>+ nfs4_wait_bit_interruptible,
>>>+ TASK_INTERRUPTIBLE);
>>>+
>>>+ spin_lock(&target_slot->slot_lock);
>>>+ --target_slot->nr_waiters;
>>>+ spin_unlock(&target_slot->slot_lock);
>>>
>>>
>>atomic_dec()
>>
>>But wait a second, why do you want a queue per slot to begin with?
>>It seems like you can just go with a semaphore initialized to
>>the number of slots, do down_interruptible before scanning
>>for a free slot and up in nfs41_proc_sequence_done. Take a
>>spin lock to scan the slot table (last index can be saved and
>>rotated for uniformity) and if not free slot is found do a
>>BUG() since the semaphore should prevent that from happening;
>>
>>
>>
>>
>>>+ }
>>>+ else
>>>+ break;
>>>+ }while (test_and_set_bit(NFS4_SLOT_BUSY, &target_slot->flags));
>>>+
>>>+ printk(KERN_EMERG "slot id: %u\nseqid: %u\n max_slots: %u\n",
>>>+ target_slot->slot_nr, target_slot->seq_nr,
>>>+ max_slots);
>>>+
>>>+ return target_slot;
>>>+}
>>>+
>>> static int _nfs41_proc_setup_sequence(struct nfs4_session
>>>
>>>
>>*session,
>>
>>
>>>struct nfs41_sequence_args *args, struct nfs41_sequence_res *res) {
>>> u32 *ptr;
>>>-
>>>+ struct nfs4_slot *slot;
>>>+
>>> ptr = (u32 *)session->sess_id;
>>> dprintk("%s: %u:%u:%u:%u\n", __FUNCTION__, ptr[0],
>>>
>>>
>>ptr[1], ptr[2],
>>
>>
>>>ptr[3]);
>>>
>>>- memcpy(args->sessionid, (unsigned char
>>>
>>>
>>*)session->sess_id, NFS4_MAX_SESSIONID_LEN);
>>
>>
>>>+ memcpy(args->sessionid, (unsigned char *)session->sess_id,
>>>+ NFS4_MAX_SESSIONID_LEN);
>>>+
>>>+ slot = nfs4_find_slot(&session->fore_channel);
>>>+
>>>+ /* XXX: Do we always increment this? Are there any
>>>
>>>
>>errors for which we
>>
>>
>>>+ * don't increment the sequence number?
>>>+ */
>>>+ args->seqid = slot->seq_nr++;
>>>+ args->slotid = slot->slot_nr;
>>>+ args->maxslots = session->fore_channel.slot_table.max_slots;
>>>+
>>>+ res->slot = slot;
>>>
>>> return 0;
>>> }
>>>@@ -1685,11 +1802,6 @@ static int nfs4_proc_get_root(struct
>>>
>>>
>>nfs_server *server, struct nfs_fh *fhandle,
>>
>>
>>> };
>>> int status;
>>>
>>>- if (server->rpc_ops->setup_sequence && (status =
>>>-
>>>
>>>
>>server->rpc_ops->setup_sequence(server->nfs4_state->cl_session,
>>
>>
>>>- &seqargs, &seqres)))
>>>- return status;
>>>-
>>> /*
>>> * Now we do a separate LOOKUP for each component of
>>>
>>>
>>the mount path.
>>
>>
>>> * The LOOKUPs are done separately so that we can
>>>
>>>
>>conveniently @@
>>
>>
>>>-1714,9 +1826,21 @@ static int nfs4_proc_get_root(struct nfs_server
>>>*server, struct nfs_fh *fhandle,
>>>
>>> do {
>>> nfs_fattr_init(fattr);
>>>+ if (server->rpc_ops->setup_sequence &&
>>>
>>>
>>(status =
>>
>>
>>>+ server->rpc_ops->setup_sequence(
>>>+ server->nfs4_state->cl_session,
>>>+ &seqargs, &seqres)))
>>>+ return status;
>>>+
>>> status = nfs4_handle_exception(server,
>>>
>>>
>>>
>>rpc_call_sync(server->client, &msg, 0),
>>
>>
>>> &exception);
>>>+
>>>+ if (server->rpc_ops->sequence_done)
>>>+ server->rpc_ops->sequence_done(
>>>+
>>>
>>>
>>server->nfs4_state->cl_session,
>>
>>
>>>+ &seqres, status);
>>>+
>>> } while (exception.retry);
>>> if (status == 0)
>>> continue;
>>>@@ -1731,10 +1855,6 @@ static int nfs4_proc_get_root(struct
>>>
>>>
>>nfs_server *server, struct nfs_fh *fhandle,
>>
>>
>>> if (status == 0)
>>> status = nfs4_do_fsinfo(server, fhandle, info);
>>> out:
>>>- if (server->rpc_ops->sequence_done)
>>>-
>>>
>>>
>>server->rpc_ops->sequence_done(server->nfs4_state->cl_session,
>>
>>
>>>- &seqres, status);
>>>-
>>> return nfs4_map_errors(status);
>>> }
>>>
>>>@@ -3268,6 +3388,7 @@ int nfs4_proc_async_sequence(struct
>>>
>>>
>>nfs4_client *clp, struct rpc_cred *cred)
>>
>>
>>> ret = rpc_call_async(clp->cl_rpcclient, &msg, RPC_TASK_SOFT,
>>> &nfs4_sequence_ops, (void *)jiffies);
>>>
>>>+ nfs41_proc_sequence_done(clp->cl_session, res, ret);
>>> out:
>>> return ret;
>>> out_free:
>>>@@ -3581,14 +3702,6 @@ nfs4_async_handle_error(struct
>>>
>>>
>>rpc_task *task, const struct nfs_server *server)
>>
>>
>>> return 0;
>>> }
>>>
>>>-static int nfs4_wait_bit_interruptible(void *word) -{
>>>- if (signal_pending(current))
>>>- return -ERESTARTSYS;
>>>- schedule();
>>>- return 0;
>>>-}
>>>-
>>> static int nfs4_wait_clnt_recover(struct rpc_clnt *clnt, struct
>>>nfs4_client *clp) {
>>> sigset_t oldset;
>>>@@ -3823,6 +3936,111 @@ int nfs4_proc_get_lease_time(struct
>>>
>>>
>>nfs4_client *clp, struct nfs_fsinfo *fsinfo)
>>
>>
>>> return status;
>>> }
>>>
>>>+/* Initialize a slot table */
>>>+int nfs4_init_slot_table(struct nfs4_channel *channel) {
>>>+ int i;
>>>+ struct nfs4_slot_table *tbl;
>>>+ struct nfs4_slot *slot;
>>>+
>>>+ tbl = &channel->slot_table;
>>>+ tbl->max_slots = channel->chan_attrs.max_reqs;
>>>+
>>>+ tbl->slots = kzalloc(tbl->max_slots * sizeof(struct
>>>
>>>
>>nfs4_slot), GFP_ATOMIC);
>>
>>
>>>+ if (!tbl->slots)
>>>+ return -ENOMEM;
>>>+
>>>+ spin_lock_init(&tbl->slot_tbl_lock);
>>>+
>>>+ for (i = 0; i < tbl->max_slots; ++i) {
>>>+ slot = &tbl->slots[i];
>>>+
>>>+ slot->slot_nr = i;
>>>+ slot->seq_nr = 1;
>>>+ slot->flags = 0;
>>>+ slot->nr_waiters = 0;
>>>+ spin_lock_init(&slot->slot_lock);
>>>+ }
>>>+
>>>+ return 0;
>>>+}
>>>+
>>>+/* Destroy the slot table */
>>>+void nfs4_destroy_slot_table(struct nfs4_channel *channel) {
>>>+ int i;
>>>+ struct nfs4_slot *slot;
>>>+ struct nfs4_slot_table *tbl;
>>>+
>>>+ tbl = &channel->slot_table;
>>>+
>>>+ for (i = 0; i < tbl->max_slots;++i) {
>>>+ slot = &tbl->slots[i];
>>>+
>>>+ if (slot->nr_waiters)
>>>+ BUG();
>>>+ }
>>>+
>>>+ kfree(channel->slot_table.slots);
>>>+ channel->slot_table.slots = NULL;
>>>+
>>>+ return;
>>>+}
>>>+
>>>+/* dump the channel attributes */
>>>+void nfs4_dump_channel_attrs(struct nfs4_channel_attrs *attrs) {
>>>+ printk(KERN_INFO "max_rqst_sz: %u\n", attrs->max_rqst_sz);
>>>+ printk(KERN_INFO "max_resp_sz: %u\n", attrs->max_resp_sz);
>>>+ printk(KERN_INFO "max_resp_sz_cached: %u\n",
>>>
>>>
>>attrs->max_resp_sz_cached);
>>
>>
>>>+ printk(KERN_INFO "max_ops: %u\n", attrs->max_ops);
>>>+ printk(KERN_INFO "max_reqs: %u\n", attrs->max_reqs); }
>>>+
>>>+/* Initialize the values to be used by the client in
>>>
>>>
>>CREATE_SESSION
>>
>>
>>>+*/ void nfs4_init_channel_attrs(struct nfs4_client *clp,
>>>+ struct nfs4_channel_attrs *fc_attrs,
>>>+ struct nfs4_channel_attrs *bc_attrs) {
>>>+ /* XXX: We need to have good values here... 32K is a
>>>
>>>
>>wild guess */
>>
>>
>>>+ fc_attrs->max_rqst_sz = bc_attrs->max_rqst_sz = 32768;
>>>+ fc_attrs->max_resp_sz = bc_attrs->max_resp_sz = 32768;
>>>+ fc_attrs->max_resp_sz_cached =
>>>
>>>
>>bc_attrs->max_resp_sz_cached = 32768;
>>
>>
>>>+ fc_attrs->max_ops = bc_attrs->max_ops = 0xFFFFFFFF;
>>>+ fc_attrs->max_reqs = bc_attrs->max_reqs =
>>>+ clp->cl_rpcclient->cl_xprt->max_reqs;
>>>+ fc_attrs->stream_attrs = bc_attrs->stream_attrs = 0;
>>>+ fc_attrs->rdma_attrs = bc_attrs->rdma_attrs = 0;
>>>+
>>>+}
>>>+
>>>+/* Check the values returned by the server for
>>>
>>>
>>CREATE_SESSION. Since
>>
>>
>>>+we made
>>>+ * our needs known, if the server gives us more than we need, we
>>>+don't bother
>>>+ * with it.
>>>+ */
>>>+void nfs4_adjust_channel_attrs(struct nfs4_channel_attrs
>>>
>>>
>>*req_attrs,
>>
>>
>>>+ struct nfs4_channel_attrs *resp_attrs) {
>>>+ if (req_attrs->max_rqst_sz < resp_attrs->max_rqst_sz)
>>>+ resp_attrs->max_rqst_sz = req_attrs->max_rqst_sz;
>>>+
>>>+ if (req_attrs->max_resp_sz < resp_attrs->max_resp_sz)
>>>+ resp_attrs->max_resp_sz = req_attrs->max_resp_sz;
>>>+
>>>+ if (req_attrs->max_resp_sz_cached < resp_attrs->max_resp_sz)
>>>+ resp_attrs->max_resp_sz = req_attrs->max_resp_sz_cached;
>>>+
>>>+ if (req_attrs->max_ops < resp_attrs->max_ops)
>>>+ resp_attrs->max_ops = req_attrs->max_ops;
>>>+
>>>+ if (req_attrs->max_reqs < resp_attrs->max_reqs)
>>>+ resp_attrs->max_reqs = req_attrs->max_reqs;
>>>+
>>>+ /* XXX: We ignore the stream channel attributes... we
>>>
>>>
>>have no idea what
>>
>>
>>>+ * to do with them anyways!
>>>+ */
>>>+}
>>>+
>>> int _nfs4_proc_create_session(struct nfs4_client *clp, struct
>>>nfs4_session *session, struct rpc_clnt *clnt) {
>>> struct nfs41_create_session_args args = { @@ -3845,7
>>>
>>>
>>+4063,19 @@ int
>>
>>
>>>_nfs4_proc_create_session(struct nfs4_client *clp, struct
>>>
>>>
>>nfs4_session *sess
>>
>>
>>> };
>>> int status;
>>>
>>>- status = rpc_call_sync(clnt, &msg, 0);
>>>+ nfs4_init_channel_attrs(clp, &args.fc_attrs, &args.bc_attrs);
>>>+
>>>+ status = rpc_call_sync(clnt, &msg, 0);
>>>+
>>>+ /* Set the negotiated values in the session's
>>>
>>>
>>channel_attrs struct
>>
>>
>>>+*/
>>>+
>>>+ if (!status) {
>>>+ nfs4_adjust_channel_attrs(&args.fc_attrs,
>>>+
>>>
>>>
>>&session->fore_channel.chan_attrs);
>>
>>
>>>+ nfs4_adjust_channel_attrs(&args.bc_attrs,
>>>+
>>>
>>>
>>&session->back_channel.chan_attrs);
>>
>>
>>>+ }
>>>+
>>> return status;
>>> }
>>> EXPORT_SYMBOL(_nfs4_proc_create_session);
>>>@@ -3853,29 +4083,28 @@ EXPORT_SYMBOL(_nfs4_proc_create_session);
>>> int nfs4_proc_create_session(struct nfs4_client *clp) {
>>> int status;
>>>- unsigned long now;
>>> struct nfs4_session *session;
>>>- struct nfs_fsinfo fsinfo;
>>> u32 *ptr;
>>>
>>>- now = jiffies;
>>>-
>>> status = _nfs4_proc_create_session(clp,
>>>
>>>
>>clp->cl_session, clp->cl_rpcclient);
>>
>>
>>> if (status)
>>> return status;
>>>
>>>- status = nfs4_proc_get_lease_time(clp, &fsinfo);
>>>- if (status == 0) {
>>>- /* Update lease time and schedule renewal */
>>>- spin_lock(&clp->cl_lock);
>>>- clp->cl_lease_time = fsinfo.lease_time * HZ;
>>>- clp->cl_last_renewal = now;
>>>- clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
>>>- spin_unlock(&clp->cl_lock);
>>>+ session = clp->cl_session;
>>>
>>>- nfs4_schedule_state_renewal(clp);
>>>+ /* Init the fore channel */
>>>+ status = nfs4_init_slot_table(&session->fore_channel);
>>>+ dprintk("fc init returned %d\n", status);
>>>+ if (status)
>>>+ return status;
>>>+
>>>+ /* Init the back channel */
>>>+ status = nfs4_init_slot_table(&session->back_channel);
>>>+ dprintk("bc init returned %d\n", status);
>>>+ if (status) {
>>>+ nfs4_destroy_slot_table(&session->fore_channel);
>>>+ return status;
>>> }
>>>- session = clp->cl_session;
>>>
>>> ptr = (int *)session->sess_id;
>>> dprintk("sessionid is: %d:%d:%d:%d\n", ptr[0], ptr[1], ptr[2],
>>>ptr[3]); @@ -4769,11 +4998,8 @@ struct nfs4_session
>>>*nfs41_alloc_session(void)
>>>
>>> session->expired = 1;
>>>
>>>- INIT_LIST_HEAD(&session->slots_in_use);
>>>- INIT_LIST_HEAD(&session->unused_slots);
>>> INIT_LIST_HEAD(&session->session_hashtbl);
>>>
>>>- //rpc_init_wait_queue(&session->slot_waitq, "Slot waitqueue");
>>> spin_lock_init(&session->session_lock);
>>>
>>> sema_init(&session->session_sem, 1); @@ -4797,6 +5023,8 @@ void
>>>nfs4_get_session(struct nfs4_session *session) void
>>>nfs4_put_session(struct nfs4_session **session) {
>>> if (atomic_dec_and_test(&((*session)->ref_count))) {
>>>+ nfs4_destroy_slot_table(&((*session)->fore_channel));
>>>+ nfs4_destroy_slot_table(&((*session)->back_channel));
>>> nfs41_free_session(*session);
>>> *session = NULL;
>>> }
>>>@@ -4806,7 +5034,11 @@ int nfs41_proc_setup_session(struct
>>>
>>>
>>nfs4_client
>>
>>
>>>*clp) {
>>> int status;
>>>
>>>- dprintk("in %s!\n", __FUNCTION__);
>>>+ struct nfs_fsinfo fsinfo;
>>>+ unsigned long now;
>>>+
>>>+ now = jiffies;
>>>+
>>> if (!clp->cl_session) {
>>> /* create the session struct to hold the
>>>
>>>
>>session parameters */
>>
>>
>>> clp->cl_session = nfs41_alloc_session(); @@
>>>
>>>
>>-4844,6 +5076,20 @@ int
>>
>>
>>>nfs41_proc_setup_session(struct nfs4_client *clp)
>>> clp->cl_session->expired = 0;
>>> clp->cl_session->client = clp;
>>>
>>>+ status = nfs4_proc_get_lease_time(clp, &fsinfo);
>>>+
>>>+ if (status)
>>>+ goto out_free;
>>>+
>>>+ /* Update lease time and schedule renewal */
>>>+ spin_lock(&clp->cl_lock);
>>>+ clp->cl_lease_time = fsinfo.lease_time * HZ;
>>>+ clp->cl_last_renewal = now;
>>>+ clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
>>>+ spin_unlock(&clp->cl_lock);
>>>+
>>>+ nfs4_schedule_state_renewal(clp);
>>>+
>>> clp->cl_session->mutating = 0;
>>> out:
>>> up(&clp->cl_session->session_sem);
>>>diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index
>>>5ef6d70..7da9a6a 100644
>>>--- a/fs/nfs/nfs4xdr.c
>>>+++ b/fs/nfs/nfs4xdr.c
>>>@@ -1429,20 +1429,22 @@ static int
>>>
>>>
>>encode_create_session(struct xdr_stream *xdr, struct nfs41_create_ses
>>
>>
>>> RESERVE_SPACE(2*28); /* 2 channel_attrs */
>>>
>>> /* Fore Channel */
>>>- WRITE32(32768); /* max req size */
>>>- WRITE32(32768); /* max resp size */
>>>- WRITE32(32768); /* Max resp
>>>
>>>
>>size cached */
>>
>>
>>>- WRITE32(RPC_DEF_SLOT_TABLE); /* max operations */
>>>- WRITE32(RPC_DEF_SLOT_TABLE); /* max requests */
>>>+ WRITE32(args->fc_attrs.max_rqst_sz); /* max req size */
>>>+ WRITE32(args->fc_attrs.max_resp_sz); /* max resp size */
>>>+ WRITE32(args->fc_attrs.max_resp_sz_cached); /* Max
>>>
>>>
>>resp sz cached */
>>
>>
>>>+ WRITE32(args->fc_attrs.max_ops); /* max operations */
>>>+ WRITE32(args->fc_attrs.max_reqs); /* max requests */
>>>+
>>> WRITE32(0); /*
>>>
>>>
>>Streamchannel attrs */
>>
>>
>>> WRITE32(0); /*rdmachannel_attrs */
>>>
>>> /* Back Channel */
>>>- WRITE32(32768); /* max req size */
>>>- WRITE32(32768); /* max resp size */
>>>- WRITE32(32768); /* Max resp
>>>
>>>
>>size cached */
>>
>>
>>>- WRITE32(RPC_DEF_SLOT_TABLE); /* max operations */
>>>- WRITE32(RPC_DEF_SLOT_TABLE); /* max requests */
>>>+ WRITE32(args->bc_attrs.max_rqst_sz); /* max req size */
>>>+ WRITE32(args->bc_attrs.max_resp_sz); /* max resp size */
>>>+ WRITE32(args->bc_attrs.max_resp_sz_cached); /* Max
>>>
>>>
>>resp sz cached */
>>
>>
>>>+ WRITE32(args->bc_attrs.max_ops); /* max operations */
>>>+ WRITE32(args->bc_attrs.max_reqs); /* max requests */
>>>+
>>> WRITE32(0); /*
>>>
>>>
>>Streamchannel attrs */
>>
>>
>>> WRITE32(0); /*rdmachannel_attrs */
>>>
>>>@@ -6404,7 +6406,7 @@ static int
>>>
>>>
>>nfs41_xdr_dec_get_lease_time(struct rpc_rqst *rqstp, uint32_t *p, str
>>
>>
>>> status = decode_putrootfh(&xdr);
>>> if (!status)
>>> status = decode_fsinfo(&xdr, res->fsinfo);
>>>- if (!status)
>>>+ if (status)
>>> status = -nfs_stat_to_errno(hdr.status);
>>> return status;
>>> }
>>>diff --git a/fs/nfs/super.c b/fs/nfs/super.c index a3bc630..89eb564
>>>100644
>>>--- a/fs/nfs/super.c
>>>+++ b/fs/nfs/super.c
>>>@@ -1260,15 +1260,13 @@ static int nfs4_fill_super(struct
>>>
>>>
>>super_block *sb, struct nfs4_mount_data *data,
>>
>>
>>> server->nfs4_state->cl_minorversion = i;
>>>
>>> if (server->rpc_ops->setup_session) {
>>>- int status;
>>>-
>>> lock_kernel();
>>> down_write(&server->nfs4_state->cl_sem);
>>>- status =
>>>
>>>
>>server->rpc_ops->setup_session(server->nfs4_state);
>>
>>
>>>+ err =
>>>+ server->rpc_ops->setup_session(server->nfs4_state);
>>> up_write(&server->nfs4_state->cl_sem);
>>> unlock_kernel();
>>>
>>>- if (status) {
>>>+ if (err) {
>>> printk(KERN_EMERG
>>>
>>>
>>"Couldn't mount using minorversion %d\n", i);
>>
>>
>>>
>>>
>>>
>>rpc_shutdown_client(server->client);
>>
>>
>>> }
>>>diff --git a/include/linux/nfs41_sessions.h
>>>b/include/linux/nfs41_sessions.h new file mode 100644 index
>>>0000000..02f45db
>>>--- /dev/null
>>>+++ b/include/linux/nfs41_sessions.h
>>>@@ -0,0 +1,72 @@
>>>+#ifndef __NFS4_1_SESSIONS_H__
>>>+#define __NFS4_1_SESSIONS_H__
>>>+
>>>+/* The flags for the nfs4_slot struct */
>>>+#define NFS4_SLOT_BUSY 0X0 /* Slot in use */
>>>+#define NFS4_SLOT_RECLAIMED 0x1 /* Slot has
>>>
>>>
>>been reclaimed by
>>
>>
>>>+ the server */
>>>+
>>>+typedef unsigned char sessionid_t[16];
>>>+typedef u32 streamchannel_attrs;
>>>+typedef u32 rdmachannel_attrs;
>>>+
>>>+struct nfs4_channel_attrs {
>>>+ u32 max_rqst_sz;
>>>+ u32 max_resp_sz;
>>>+ u32 max_resp_sz_cached;
>>>+ u32 max_ops;
>>>+ u32 max_reqs;
>>>+ streamchannel_attrs stream_attrs;
>>>+ rdmachannel_attrs rdma_attrs;
>>>+};
>>>+
>>>+struct nfs4_slot {
>>>+ u32 slot_nr;
>>>+ u32 seq_nr;
>>>+ unsigned long flags;
>>>+ u32 nr_waiters;
>>>+ spinlock_t slot_lock;
>>>+};
>>>+
>>>+struct nfs4_slot_table {
>>>+ struct nfs4_slot *slots;
>>>+ u32 max_slots;
>>>+ spinlock_t slot_tbl_lock;
>>>+};
>>>+
>>>+struct nfs4_channel {
>>>+ struct nfs4_channel_attrs chan_attrs;
>>>+ struct rpc_clnt *rpc_client;
>>>+ struct nfs4_slot_table slot_table;
>>>+};
>>>+
>>>+struct nfs4_session {
>>>+ /* Session related params */
>>>+ sessionid_t sess_id;
>>>+ u32 seqid; /* The seqid
>>>
>>>
>>returned by
>>
>>
>>>+ exchange_id */
>>>+ u32 persist;
>>>+ u32 header_padding;
>>>+ u32 hash_alg;
>>>+ u32 ssv_len;
>>>+ u32 use_for_back_chan;
>>>+ u32 rdma_mode;
>>>+
>>>+ /* The fore and back channel */
>>>+ struct nfs4_channel fore_channel;
>>>+ struct nfs4_channel back_channel;
>>>+
>>>+ unsigned int expired;
>>>+ struct nfs4_client * client;
>>>+ struct list_head session_hashtbl;
>>>+ spinlock_t session_lock;
>>>+ /* To prevent races between create_session and sequence */
>>>+ int mutating;
>>>+ struct semaphore session_sem;
>>>+ atomic_t ref_count;
>>>+};
>>>+
>>>+
>>>+#endif
>>>+
>>>+
>>>diff --git a/include/linux/nfs_xdr.h
>>>
>>>
>>b/include/linux/nfs_xdr.h index
>>
>>
>>>4648707..52b9e38 100644
>>>--- a/include/linux/nfs_xdr.h
>>>+++ b/include/linux/nfs_xdr.h
>>>@@ -3,6 +3,7 @@
>>>
>>> #include <linux/sunrpc/xprt.h>
>>> #include <linux/nfsacl.h>
>>>+#include <linux/nfs41_sessions.h>
>>>
>>> /*
>>> * To change the maximum rsize and wsize supported by the
>>>
>>>
>>NFS client,
>>
>>
>>>adjust @@ -778,6 +779,8 @@ struct nfs41_create_session_args {
>>> uint32_t use_for_backchannel;
>>> uint32_t use_for_rdma;
>>> uint32_t cb_program;
>>>+ struct nfs4_channel_attrs fc_attrs; /* Fore
>>>
>>>
>>Channel */
>>
>>
>>>+ struct nfs4_channel_attrs bc_attrs; /* Back
>>>
>>>
>>Channel */
>>
>>
>>> };
>>>
>>> struct nfs41_create_session_res {
>>>@@ -805,6 +808,7 @@ struct nfs41_sequence_res {
>>> u32 target_maxslots;
>>> u32 status_flags;
>>> struct nfs4_state_owner *sp;
>>>+ struct nfs4_slot *slot;
>>> };
>>>
>>> struct nfs4_get_lease_time_args {
>>>diff --git a/include/linux/nfsd/state.h
>>>
>>>
>>b/include/linux/nfsd/state.h
>>
>>
>>>index 8f699b4..fde667e 100644
>>>--- a/include/linux/nfsd/state.h
>>>+++ b/include/linux/nfsd/state.h
>>>@@ -47,8 +47,6 @@ typedef struct {
>>> u32 cl_id;
>>> } clientid_t;
>>>
>>>-typedef unsigned char sessionid_t[16];
>>>-
>>> typedef struct {
>>> u32 so_boot;
>>> u32 so_stateownerid;
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/pnfs/attachments/20070318/f1488c5e/attachment-0001.htm
More information about the pNFS
mailing list