[pnfs] [PATCH 4/12] use kref for nfsd nfs4_client
William A. (Andy) Adamson
andros at citi.umich.edu
Fri May 11 11:43:47 EDT 2007
On 5/11/07, Benny Halevy <bhalevy at panasas.com> wrote:
>
> William A. (Andy) Adamson wrote:
> > On 5/9/07, Benny Halevy <bhalevy at panasas.com> wrote:
> >> J. Bruce Fields wrote:
> >>> On Tue, May 08, 2007 at 10:26:12PM +0300, Benny Halevy wrote:
> >>>
> >>>> So the first step can be justified in mainline as in
> >> nfsd4_probe_callback's
> >>>> out_rpciod error case atomic_dec(&clp->cl_count) won't free up the
> >> client
> >>>> structure if it was expired concurrently. expire_client shuts down
> the
> >>>> callback rpc path so that may wait for nfsd4_probe_callback to finish
> >>>> but I'm not sure. At any rate, calling put_nfs4_client() there is
> >>>> more robust.
> >>>>
> >>> Yes, the destruction of the rpc_client shouldn't succeed until any
> >>> outstanding tasks are done. But I agree that the reference counting
> >>> there is at best odd.
> >>>
> >>> Actually, the caller (setclientid_confirm) also works with this client
> >>> without a reference; if the client could be expired at any time, then
> >>> that would already be a problem for it. So I suppose it's just
> >>> depending on the state lock to prevent the client expiration. We're
> >>> taking the reference in nfsd4_probe_callback() because we know it may
> be
> >>> held after the state lock is dropped.
> >>>
> >> I think that the synchronization between expire_client and
> nfsd4_cb_recall
> >> is broken if the callback is done not under the state_lock, even if the
> >> caller of the callback has a reference on the client since the rpc_clnt
> in
> >> clp->cl_callback.cb_client can go away before the callback rpc starts.
> >
> >
> > good point. bump both the rpc_clnt->cl_count as well as the
> > nfs4_client->cl_count in nfsd_break_deleg_cb() prior to the kthread_run
> and
> > then call put_nfs4_client() and rpc_destroy_client() at the end of
> > nfs4_cb_recall().
> <snip>
>
> I'm afraid that rpc_destroy_client() is not the right api to call.
> Ideally, the callback process would take a ref on clnt->cl_users and call
> rpc_release_client when the call is done. This will synchronize correctly
> with rpc_shutdown_client *but* rpc_release_client is not exported by
> the sunrpc module. Since the synchronization with outstanding callbacks
> is supposed to happen anyway, shutting down the callback rpc client
> only when the client's refcount goes to zero should work as well.
perhaps exporting rpc_release_client is the way to go.
The attached patch is against mainline nfs-2.6, if you're good with it
> I can send it also to linux-nfs for review. It's passing connectathon
> tests
> plus going through expire_client via the nfs4 laundromat, but I haven't
> tested cb_recall.
You need to test handingout a delegation, and then have a local non-nfsd
process cause a break_lease.
e.g. turn on read delegations, open a file read-only from the NFSv4 client
(twice - the first open will not geta delegation due to no open_confirm),
then open the same file read-write from a non-nfsd local process which will
case a break_lease, and cause the a recall of the read delegation.
Benny
>
>
> From 0647b1f0e7cb99b6c0d5f65fd4ace4a359aae206 Mon Sep 17 00:00:00 2001
> From: Benny Halevy <bhalevy at ns1.bhalevy.com>
> Date: Fri, 11 May 2007 12:39:16 +0300
> Subject: [PATCH] nfs4_client ref count for cb_recall
>
> This patch changes the client reference counting technique
> to struct kref, moves shutting down of the callback rpc_client
> from expire_client time to free_nfs4_client time that's called
> when the nfs4_client refcount goes to zero, adds a refcount
> to the nfs4_client around nfs4_cb_recall to keep the rcp_client
> around throughout the recall (otherwise it could have been used
> after free), and finally it adds some BUG_ONs to assert that the
> state lock is taken when taking or releasing reference for nfs4_file,
> nfs4_client, and nfs4_delegation to ensure atomicity.
> ---
> fs/nfsd/nfs4callback.c | 11 +++++-
> fs/nfsd/nfs4state.c | 80
> ++++++++++++++++++++++++++++++++-----------
> include/linux/nfsd/state.h | 3 +-
> 3 files changed, 70 insertions(+), 24 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 32ffea0..ae60a7d 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -433,7 +433,7 @@ nfsd4_probe_callback(struct nfs4_client *clp)
> goto out_clnt;
>
> /* the task holds a reference to the nfs4_client struct */
> - atomic_inc(&clp->cl_count);
> + get_nfs4_client(clp);
>
> msg.rpc_cred = nfsd4_lookupcred(clp,0);
> if (IS_ERR(msg.rpc_cred))
> @@ -448,7 +448,9 @@ nfsd4_probe_callback(struct nfs4_client *clp)
> return;
>
> out_rpciod:
> - atomic_dec(&clp->cl_count);
> + nfs4_lock_state();
> + put_nfs4_client(clp);
> + nfs4_unlock_state();
> rpciod_down();
> out_clnt:
> rpc_shutdown_client(cb->cb_client);
> @@ -475,7 +477,9 @@ nfs4_cb_null(struct rpc_task *task, void *dummy)
> atomic_set(&cb->cb_set, 1);
> dprintk("NFSD: callback set to client %u.%u.%u.%u\n",
> NIPQUAD(addr));
> out:
> + nfs4_lock_state();
> put_nfs4_client(clp);
> + nfs4_unlock_state();
> }
>
> static const struct rpc_call_ops nfs4_cb_null_ops = {
> @@ -533,6 +537,9 @@ out:
> /* Success or failure, now we're either waiting for lease
> expiration
> * or deleg_return. */
> dprintk("NFSD: nfs4_cb_recall: dp %p dl_flock %p dl_count
> %d\n",dp, dp->dl_flock, atomic_read(&dp->dl_count));
> + nfs4_lock_state();
> + put_nfs4_client(dp->dl_client);
> nfs4_put_delegation(dp);
> + nfs4_unlock_state();
> return;
> }
nfs4_cb_recall can not call nfs4_lock_state() because nfs4_cb_recall can be
called by a non-NFSD local process on the NFSv4 server machine (e.g. a
Samba server) that causes a break_lease on a file with a delegation handed
out by the NFSv4 server. Such a process has no reference to the
client_mutex.
The same is true with the BUG_ON_UNLOCKED_STATE call you added to
nfs4_put_delegation which is called by nfs4_cb_recall.
The same is true with calling nfs4_state_lock in nfsd_break_deleg_cb().
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 3cc8ce4..2a2eb97 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -90,6 +90,8 @@ static struct kmem_cache *file_slab = NULL;
> static struct kmem_cache *stateid_slab = NULL;
> static struct kmem_cache *deleg_slab = NULL;
>
> +#define BUG_ON_UNLOCKED_STATE() BUG_ON(mutex_trylock(&client_mutex))
> +
> void
> nfs4_lock_state(void)
> {
> @@ -139,6 +141,7 @@ free_nfs4_file(struct kref *kref)
> static inline void
> put_nfs4_file(struct nfs4_file *fi)
> {
> + BUG_ON_UNLOCKED_STATE();
> kref_put(&fi->fi_ref, free_nfs4_file);
> }
>
> @@ -228,6 +231,7 @@ alloc_init_deleg(struct nfs4_client *clp, struct
> nfs4_stateid *stp, struct svc_f
> void
> nfs4_put_delegation(struct nfs4_delegation *dp)
> {
> + BUG_ON_UNLOCKED_STATE();
> if (atomic_dec_and_test(&dp->dl_count)) {
> dprintk("NFSD: freeing dp %p\n",dp);
> put_nfs4_file(dp->dl_file);
> @@ -353,20 +357,11 @@ alloc_client(struct xdr_netobj name)
> return clp;
> }
>
Can't add BUG_ON_UNLOCKED_STATE in nfs4_put_delegation - see above.
-static inline void
> -free_client(struct nfs4_client *clp)
> -{
> - if (clp->cl_cred.cr_group_info)
> - put_group_info(clp->cl_cred.cr_group_info);
> - kfree(clp->cl_name.data);
> - kfree(clp);
> -}
> -
> -void
> -put_nfs4_client(struct nfs4_client *clp)
> +struct nfs4_client *
> +get_nfs4_client(struct nfs4_client *clp)
> {
> - if (atomic_dec_and_test(&clp->cl_count))
> - free_client(clp);
> + kref_get(&clp->cl_ref);
> + return clp;
> }
>
> static void
> @@ -383,17 +378,43 @@ shutdown_callback_client(struct nfs4_client *clp)
> }
>
> static void
> +free_nfs4_client(struct kref *kref)
> +{
> + struct nfs4_client *clp = container_of(kref, struct nfs4_client,
> cl_ref);
> + BUG_ON(!list_empty(&clp->cl_idhash));
> + BUG_ON(!list_empty(&clp->cl_strhash));
> + BUG_ON(!list_empty(&clp->cl_lru));
> + BUG_ON(!list_empty(&clp->cl_delegations));
> + BUG_ON(!list_empty(&clp->cl_openowners));
> + shutdown_callback_client(clp);
> + if (clp->cl_cred.cr_group_info)
> + put_group_info(clp->cl_cred.cr_group_info);
> + kfree(clp->cl_name.data);
> + kfree(clp);
> +}
> +
> +void
> +put_nfs4_client(struct nfs4_client *clp)
> +{
> + BUG_ON_UNLOCKED_STATE();
> + if (clp)
> + kref_put(&clp->cl_ref, free_nfs4_client);
> +}
> +
> +static void
> expire_client(struct nfs4_client *clp)
> {
> struct nfs4_stateowner *sop;
> struct nfs4_delegation *dp;
> struct list_head reaplist;
>
> - dprintk("NFSD: expire_client cl_count %d\n",
> - atomic_read(&clp->cl_count));
> -
> - shutdown_callback_client(clp);
> + dprintk("NFSD: expire_client refcount %d\n",
> + atomic_read(&clp->cl_ref.refcount));
>
> + BUG_ON_UNLOCKED_STATE();
> + atomic_set(&clp->cl_callback.cb_set, 0);
> + if (clp->cl_callback.cb_client != NULL)
> + rpc_killall_tasks(clp->cl_callback.cb_client);
Since cb_client is only set to NULL in shutdown_callback_client which is
only called in free_nfs4_client, why the NULL check?
does rpc_killall_tasks() race with rpc_release_client() called in
rpc_put_task? I'm looking at the following comment in
clnt.c:rpc_shutdown_client.
/* Don't let rpc_release_client destroy us */
clnt->cl_oneshot = 0;
clnt->cl_dead = 0;
rpc_killall_tasks(clnt);
INIT_LIST_HEAD(&reaplist);
> spin_lock(&recall_lock);
> while (!list_empty(&clp->cl_delegations)) {
> @@ -409,9 +430,9 @@ expire_client(struct nfs4_client *clp)
> list_del_init(&dp->dl_recall_lru);
> unhash_delegation(dp);
> }
> - list_del(&clp->cl_idhash);
> - list_del(&clp->cl_strhash);
> - list_del(&clp->cl_lru);
> + list_del_init(&clp->cl_idhash);
> + list_del_init(&clp->cl_strhash);
> + list_del_init(&clp->cl_lru);
> while (!list_empty(&clp->cl_openowners)) {
> sop = list_entry(clp->cl_openowners.next, struct
> nfs4_stateowner, so_perclient);
> release_stateowner(sop);
> @@ -426,7 +447,7 @@ create_client(struct xdr_netobj name, char *recdir) {
> if (!(clp = alloc_client(name)))
> goto out;
> memcpy(clp->cl_recdir, recdir, HEXDIR_LEN);
> - atomic_set(&clp->cl_count, 1);
> + kref_init(&clp->cl_ref);
> atomic_set(&clp->cl_callback.cb_set, 0);
> INIT_LIST_HEAD(&clp->cl_idhash);
> INIT_LIST_HEAD(&clp->cl_strhash);
> @@ -1213,6 +1234,7 @@ find_openstateowner_str(unsigned int hashval, struct
> nfsd4_open *open)
> {
> struct nfs4_stateowner *so = NULL;
>
> + BUG_ON_UNLOCKED_STATE();
> list_for_each_entry(so, &ownerstr_hashtbl[hashval], so_strhash) {
> if (cmp_owner_str(so, &open->op_owner,
> &open->op_clientid))
> return so;
> @@ -1227,6 +1249,7 @@ find_file(struct inode *ino)
> unsigned int hashval = file_hashval(ino);
> struct nfs4_file *fp;
>
> + BUG_ON_UNLOCKED_STATE();
> list_for_each_entry(fp, &file_hashtbl[hashval], fi_hash) {
> if (fp->fi_inode == ino) {
> get_nfs4_file(fp);
> @@ -1342,12 +1365,24 @@ static
> void nfsd_break_deleg_cb(struct file_lock *fl)
> {
> struct nfs4_delegation *dp= (struct nfs4_delegation
> *)fl->fl_owner;
> + struct nfs4_client *clp;
> + struct rpc_clnt *clnt;
> struct task_struct *t;
>
> dprintk("NFSD nfsd_break_deleg_cb: dp %p fl %p\n",dp,fl);
> if (!dp)
> return;
>
> + nfs4_lock_state();
> + clp = dp->dl_client;
> + clnt = clp->cl_callback.cb_client;
> + if (!atomic_read(&clp->cl_callback.cb_set) || !clnt) {
> + nfs4_lock_state();
^^^^^^^^^
shouldn't this be nfs4_unlock_state?
+ return;
> + }
> + get_nfs4_client(clp);
> + nfs4_lock_state();
^^^^^^^^^^
this too?
+
> /* We're assuming the state code never drops its reference
> * without first removing the lease. Since we're in this lease
> * callback (and since the lease code is serialized by the kernel
> @@ -1372,7 +1407,10 @@ void nfsd_break_deleg_cb(struct file_lock *fl)
> printk(KERN_INFO "NFSD: Callback thread failed for "
> "for client (clientid %08x/%08x)\n",
> clp->cl_clientid.cl_boot, clp->cl_clientid.cl_id);
> + nfs4_lock_state();
> + put_nfs4_client(clp);
> nfs4_put_delegation(dp);
> + nfs4_unlock_state();
> }
> }
Can't call nfs4_lock_state in nfsd_break_deleg_cb. see above.
diff --git a/include/linux/nfsd/state.h b/include/linux/nfsd/state.h
> index ab5c236..84463a3 100644
> --- a/include/linux/nfsd/state.h
> +++ b/include/linux/nfsd/state.h
> @@ -116,6 +116,7 @@ struct nfs4_callback {
> * when we expire the nfs4_client
> */
> struct nfs4_client {
> + struct kref cl_ref;
> struct list_head cl_idhash; /* hash by cl_clientid.id
> */
> struct list_head cl_strhash; /* hash by cl_name */
> struct list_head cl_openowners;
> @@ -130,7 +131,6 @@ struct nfs4_client {
> clientid_t cl_clientid; /* generated by server */
> nfs4_verifier cl_confirm; /* generated by server */
> struct nfs4_callback cl_callback; /* callback info */
> - atomic_t cl_count; /* ref count */
> u32 cl_firststate; /* recovery dir creation
> */
> };
>
> @@ -279,6 +279,7 @@ extern void nfs4_lock_state(void);
> extern void nfs4_unlock_state(void);
> extern int nfs4_in_grace(void);
> extern __be32 nfs4_check_open_reclaim(clientid_t *clid);
> +extern struct nfs4_client *get_nfs4_client(struct nfs4_client *clp);
> extern void put_nfs4_client(struct nfs4_client *clp);
> extern void nfs4_free_stateowner(struct kref *kref);
> extern void nfsd4_probe_callback(struct nfs4_client *clp);
> --
> 1.5.1
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/pnfs/attachments/20070511/a3d1ec92/attachment-0001.htm
More information about the pNFS
mailing list