[pnfs] [PATCH 4/12] use kref for nfsd nfs4_client
Benny Halevy
bhalevy at panasas.com
Mon May 14 09:49:05 EDT 2007
Ok. Here is is with taking a reference on cl_users.
It's simpler but requires exporting rpc_release_client().
I tested getting and recalling a delegation as you recommended
and I also ran it through the connectathon tests, so as far as
I'm concerned I think it's ready to go.
Benny
William A. (Andy) Adamson wrote:
> On 5/11/07, Benny Halevy <bhalevy at panasas.com> wrote:
>> William A. (Andy) Adamson wrote:
>>> On 5/9/07, Benny Halevy <bhalevy at panasas.com> wrote:
>>>> J. Bruce Fields wrote:
>>>>> On Tue, May 08, 2007 at 10:26:12PM +0300, Benny Halevy wrote:
>>>>>
>>>>>> So the first step can be justified in mainline as in
>>>> nfsd4_probe_callback's
>>>>>> out_rpciod error case atomic_dec(&clp->cl_count) won't free up the
>>>> client
>>>>>> structure if it was expired concurrently. expire_client shuts down
>> the
>>>>>> callback rpc path so that may wait for nfsd4_probe_callback to finish
>>>>>> but I'm not sure. At any rate, calling put_nfs4_client() there is
>>>>>> more robust.
>>>>>>
>>>>> Yes, the destruction of the rpc_client shouldn't succeed until any
>>>>> outstanding tasks are done. But I agree that the reference counting
>>>>> there is at best odd.
>>>>>
>>>>> Actually, the caller (setclientid_confirm) also works with this client
>>>>> without a reference; if the client could be expired at any time, then
>>>>> that would already be a problem for it. So I suppose it's just
>>>>> depending on the state lock to prevent the client expiration. We're
>>>>> taking the reference in nfsd4_probe_callback() because we know it may
>> be
>>>>> held after the state lock is dropped.
>>>>>
>>>> I think that the synchronization between expire_client and
>> nfsd4_cb_recall
>>>> is broken if the callback is done not under the state_lock, even if the
>>>> caller of the callback has a reference on the client since the rpc_clnt
>> in
>>>> clp->cl_callback.cb_client can go away before the callback rpc starts.
>>>
>>> good point. bump both the rpc_clnt->cl_count as well as the
>>> nfs4_client->cl_count in nfsd_break_deleg_cb() prior to the kthread_run
>> and
>>> then call put_nfs4_client() and rpc_destroy_client() at the end of
>>> nfs4_cb_recall().
>> <snip>
>>
>> I'm afraid that rpc_destroy_client() is not the right api to call.
>> Ideally, the callback process would take a ref on clnt->cl_users and call
>> rpc_release_client when the call is done. This will synchronize correctly
>> with rpc_shutdown_client *but* rpc_release_client is not exported by
>> the sunrpc module. Since the synchronization with outstanding callbacks
>> is supposed to happen anyway, shutting down the callback rpc client
>> only when the client's refcount goes to zero should work as well.
>
>
> perhaps exporting rpc_release_client is the way to go.
>
> The attached patch is against mainline nfs-2.6, if you're good with it
>> I can send it also to linux-nfs for review. It's passing connectathon
>> tests
>> plus going through expire_client via the nfs4 laundromat, but I haven't
>> tested cb_recall.
>
>
> You need to test handingout a delegation, and then have a local non-nfsd
> process cause a break_lease.
>
> e.g. turn on read delegations, open a file read-only from the NFSv4 client
> (twice - the first open will not geta delegation due to no open_confirm),
> then open the same file read-write from a non-nfsd local process which will
> case a break_lease, and cause the a recall of the read delegation.
>
>
> Benny
>>
>> From 0647b1f0e7cb99b6c0d5f65fd4ace4a359aae206 Mon Sep 17 00:00:00 2001
>> From: Benny Halevy <bhalevy at ns1.bhalevy.com>
>> Date: Fri, 11 May 2007 12:39:16 +0300
>> Subject: [PATCH] nfs4_client ref count for cb_recall
>>
>> This patch changes the client reference counting technique
>> to struct kref, moves shutting down of the callback rpc_client
>> from expire_client time to free_nfs4_client time that's called
>> when the nfs4_client refcount goes to zero, adds a refcount
>> to the nfs4_client around nfs4_cb_recall to keep the rcp_client
>> around throughout the recall (otherwise it could have been used
>> after free), and finally it adds some BUG_ONs to assert that the
>> state lock is taken when taking or releasing reference for nfs4_file,
>> nfs4_client, and nfs4_delegation to ensure atomicity.
>> ---
>> fs/nfsd/nfs4callback.c | 11 +++++-
>> fs/nfsd/nfs4state.c | 80
>> ++++++++++++++++++++++++++++++++-----------
>> include/linux/nfsd/state.h | 3 +-
>> 3 files changed, 70 insertions(+), 24 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>> index 32ffea0..ae60a7d 100644
>> --- a/fs/nfsd/nfs4callback.c
>> +++ b/fs/nfsd/nfs4callback.c
>> @@ -433,7 +433,7 @@ nfsd4_probe_callback(struct nfs4_client *clp)
>> goto out_clnt;
>>
>> /* the task holds a reference to the nfs4_client struct */
>> - atomic_inc(&clp->cl_count);
>> + get_nfs4_client(clp);
>>
>> msg.rpc_cred = nfsd4_lookupcred(clp,0);
>> if (IS_ERR(msg.rpc_cred))
>> @@ -448,7 +448,9 @@ nfsd4_probe_callback(struct nfs4_client *clp)
>> return;
>>
>> out_rpciod:
>> - atomic_dec(&clp->cl_count);
>> + nfs4_lock_state();
>> + put_nfs4_client(clp);
>> + nfs4_unlock_state();
>> rpciod_down();
>> out_clnt:
>> rpc_shutdown_client(cb->cb_client);
>> @@ -475,7 +477,9 @@ nfs4_cb_null(struct rpc_task *task, void *dummy)
>> atomic_set(&cb->cb_set, 1);
>> dprintk("NFSD: callback set to client %u.%u.%u.%u\n",
>> NIPQUAD(addr));
>> out:
>> + nfs4_lock_state();
>> put_nfs4_client(clp);
>> + nfs4_unlock_state();
>> }
>>
>> static const struct rpc_call_ops nfs4_cb_null_ops = {
>> @@ -533,6 +537,9 @@ out:
>> /* Success or failure, now we're either waiting for lease
>> expiration
>> * or deleg_return. */
>> dprintk("NFSD: nfs4_cb_recall: dp %p dl_flock %p dl_count
>> %d\n",dp, dp->dl_flock, atomic_read(&dp->dl_count));
>> + nfs4_lock_state();
>> + put_nfs4_client(dp->dl_client);
>> nfs4_put_delegation(dp);
>> + nfs4_unlock_state();
>> return;
>> }
>
>
> nfs4_cb_recall can not call nfs4_lock_state() because nfs4_cb_recall can be
> called by a non-NFSD local process on the NFSv4 server machine (e.g. a
> Samba server) that causes a break_lease on a file with a delegation handed
> out by the NFSv4 server. Such a process has no reference to the
> client_mutex.
>
> The same is true with the BUG_ON_UNLOCKED_STATE call you added to
> nfs4_put_delegation which is called by nfs4_cb_recall.
>
> The same is true with calling nfs4_state_lock in nfsd_break_deleg_cb().
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 3cc8ce4..2a2eb97 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -90,6 +90,8 @@ static struct kmem_cache *file_slab = NULL;
>> static struct kmem_cache *stateid_slab = NULL;
>> static struct kmem_cache *deleg_slab = NULL;
>>
>> +#define BUG_ON_UNLOCKED_STATE() BUG_ON(mutex_trylock(&client_mutex))
>> +
>> void
>> nfs4_lock_state(void)
>> {
>> @@ -139,6 +141,7 @@ free_nfs4_file(struct kref *kref)
>> static inline void
>> put_nfs4_file(struct nfs4_file *fi)
>> {
>> + BUG_ON_UNLOCKED_STATE();
>> kref_put(&fi->fi_ref, free_nfs4_file);
>> }
>>
>> @@ -228,6 +231,7 @@ alloc_init_deleg(struct nfs4_client *clp, struct
>> nfs4_stateid *stp, struct svc_f
>> void
>> nfs4_put_delegation(struct nfs4_delegation *dp)
>> {
>> + BUG_ON_UNLOCKED_STATE();
>> if (atomic_dec_and_test(&dp->dl_count)) {
>> dprintk("NFSD: freeing dp %p\n",dp);
>> put_nfs4_file(dp->dl_file);
>> @@ -353,20 +357,11 @@ alloc_client(struct xdr_netobj name)
>> return clp;
>> }
>>
>
> Can't add BUG_ON_UNLOCKED_STATE in nfs4_put_delegation - see above.
>
> -static inline void
>> -free_client(struct nfs4_client *clp)
>> -{
>> - if (clp->cl_cred.cr_group_info)
>> - put_group_info(clp->cl_cred.cr_group_info);
>> - kfree(clp->cl_name.data);
>> - kfree(clp);
>> -}
>> -
>> -void
>> -put_nfs4_client(struct nfs4_client *clp)
>> +struct nfs4_client *
>> +get_nfs4_client(struct nfs4_client *clp)
>> {
>> - if (atomic_dec_and_test(&clp->cl_count))
>> - free_client(clp);
>> + kref_get(&clp->cl_ref);
>> + return clp;
>> }
>>
>> static void
>> @@ -383,17 +378,43 @@ shutdown_callback_client(struct nfs4_client *clp)
>> }
>>
>> static void
>> +free_nfs4_client(struct kref *kref)
>> +{
>> + struct nfs4_client *clp = container_of(kref, struct nfs4_client,
>> cl_ref);
>> + BUG_ON(!list_empty(&clp->cl_idhash));
>> + BUG_ON(!list_empty(&clp->cl_strhash));
>> + BUG_ON(!list_empty(&clp->cl_lru));
>> + BUG_ON(!list_empty(&clp->cl_delegations));
>> + BUG_ON(!list_empty(&clp->cl_openowners));
>> + shutdown_callback_client(clp);
>> + if (clp->cl_cred.cr_group_info)
>> + put_group_info(clp->cl_cred.cr_group_info);
>> + kfree(clp->cl_name.data);
>> + kfree(clp);
>> +}
>> +
>> +void
>> +put_nfs4_client(struct nfs4_client *clp)
>> +{
>> + BUG_ON_UNLOCKED_STATE();
>> + if (clp)
>> + kref_put(&clp->cl_ref, free_nfs4_client);
>> +}
>> +
>> +static void
>> expire_client(struct nfs4_client *clp)
>> {
>> struct nfs4_stateowner *sop;
>> struct nfs4_delegation *dp;
>> struct list_head reaplist;
>>
>> - dprintk("NFSD: expire_client cl_count %d\n",
>> - atomic_read(&clp->cl_count));
>> -
>> - shutdown_callback_client(clp);
>> + dprintk("NFSD: expire_client refcount %d\n",
>> + atomic_read(&clp->cl_ref.refcount));
>>
>> + BUG_ON_UNLOCKED_STATE();
>> + atomic_set(&clp->cl_callback.cb_set, 0);
>> + if (clp->cl_callback.cb_client != NULL)
>> + rpc_killall_tasks(clp->cl_callback.cb_client);
>
>
> Since cb_client is only set to NULL in shutdown_callback_client which is
> only called in free_nfs4_client, why the NULL check?
>
> does rpc_killall_tasks() race with rpc_release_client() called in
> rpc_put_task? I'm looking at the following comment in
> clnt.c:rpc_shutdown_client.
>
> /* Don't let rpc_release_client destroy us */
> clnt->cl_oneshot = 0;
> clnt->cl_dead = 0;
> rpc_killall_tasks(clnt);
>
>
> INIT_LIST_HEAD(&reaplist);
>> spin_lock(&recall_lock);
>> while (!list_empty(&clp->cl_delegations)) {
>> @@ -409,9 +430,9 @@ expire_client(struct nfs4_client *clp)
>> list_del_init(&dp->dl_recall_lru);
>> unhash_delegation(dp);
>> }
>> - list_del(&clp->cl_idhash);
>> - list_del(&clp->cl_strhash);
>> - list_del(&clp->cl_lru);
>> + list_del_init(&clp->cl_idhash);
>> + list_del_init(&clp->cl_strhash);
>> + list_del_init(&clp->cl_lru);
>> while (!list_empty(&clp->cl_openowners)) {
>> sop = list_entry(clp->cl_openowners.next, struct
>> nfs4_stateowner, so_perclient);
>> release_stateowner(sop);
>> @@ -426,7 +447,7 @@ create_client(struct xdr_netobj name, char *recdir) {
>> if (!(clp = alloc_client(name)))
>> goto out;
>> memcpy(clp->cl_recdir, recdir, HEXDIR_LEN);
>> - atomic_set(&clp->cl_count, 1);
>> + kref_init(&clp->cl_ref);
>> atomic_set(&clp->cl_callback.cb_set, 0);
>> INIT_LIST_HEAD(&clp->cl_idhash);
>> INIT_LIST_HEAD(&clp->cl_strhash);
>> @@ -1213,6 +1234,7 @@ find_openstateowner_str(unsigned int hashval, struct
>> nfsd4_open *open)
>> {
>> struct nfs4_stateowner *so = NULL;
>>
>> + BUG_ON_UNLOCKED_STATE();
>> list_for_each_entry(so, &ownerstr_hashtbl[hashval], so_strhash) {
>> if (cmp_owner_str(so, &open->op_owner,
>> &open->op_clientid))
>> return so;
>> @@ -1227,6 +1249,7 @@ find_file(struct inode *ino)
>> unsigned int hashval = file_hashval(ino);
>> struct nfs4_file *fp;
>>
>> + BUG_ON_UNLOCKED_STATE();
>> list_for_each_entry(fp, &file_hashtbl[hashval], fi_hash) {
>> if (fp->fi_inode == ino) {
>> get_nfs4_file(fp);
>> @@ -1342,12 +1365,24 @@ static
>> void nfsd_break_deleg_cb(struct file_lock *fl)
>> {
>> struct nfs4_delegation *dp= (struct nfs4_delegation
>> *)fl->fl_owner;
>> + struct nfs4_client *clp;
>> + struct rpc_clnt *clnt;
>> struct task_struct *t;
>>
>> dprintk("NFSD nfsd_break_deleg_cb: dp %p fl %p\n",dp,fl);
>> if (!dp)
>> return;
>>
>> + nfs4_lock_state();
>> + clp = dp->dl_client;
>> + clnt = clp->cl_callback.cb_client;
>> + if (!atomic_read(&clp->cl_callback.cb_set) || !clnt) {
>> + nfs4_lock_state();
>
> ^^^^^^^^^
> shouldn't this be nfs4_unlock_state?
>
> + return;
>> + }
>> + get_nfs4_client(clp);
>> + nfs4_lock_state();
>
> ^^^^^^^^^^
> this too?
>
> +
>> /* We're assuming the state code never drops its reference
>> * without first removing the lease. Since we're in this lease
>> * callback (and since the lease code is serialized by the kernel
>> @@ -1372,7 +1407,10 @@ void nfsd_break_deleg_cb(struct file_lock *fl)
>> printk(KERN_INFO "NFSD: Callback thread failed for "
>> "for client (clientid %08x/%08x)\n",
>> clp->cl_clientid.cl_boot, clp->cl_clientid.cl_id);
>> + nfs4_lock_state();
>> + put_nfs4_client(clp);
>> nfs4_put_delegation(dp);
>> + nfs4_unlock_state();
>> }
>> }
>
>
> Can't call nfs4_lock_state in nfsd_break_deleg_cb. see above.
>
> diff --git a/include/linux/nfsd/state.h b/include/linux/nfsd/state.h
>> index ab5c236..84463a3 100644
>> --- a/include/linux/nfsd/state.h
>> +++ b/include/linux/nfsd/state.h
>> @@ -116,6 +116,7 @@ struct nfs4_callback {
>> * when we expire the nfs4_client
>> */
>> struct nfs4_client {
>> + struct kref cl_ref;
>> struct list_head cl_idhash; /* hash by cl_clientid.id
>> */
>> struct list_head cl_strhash; /* hash by cl_name */
>> struct list_head cl_openowners;
>> @@ -130,7 +131,6 @@ struct nfs4_client {
>> clientid_t cl_clientid; /* generated by server */
>> nfs4_verifier cl_confirm; /* generated by server */
>> struct nfs4_callback cl_callback; /* callback info */
>> - atomic_t cl_count; /* ref count */
>> u32 cl_firststate; /* recovery dir creation
>> */
>> };
>>
>> @@ -279,6 +279,7 @@ extern void nfs4_lock_state(void);
>> extern void nfs4_unlock_state(void);
>> extern int nfs4_in_grace(void);
>> extern __be32 nfs4_check_open_reclaim(clientid_t *clid);
>> +extern struct nfs4_client *get_nfs4_client(struct nfs4_client *clp);
>> extern void put_nfs4_client(struct nfs4_client *clp);
>> extern void nfs4_free_stateowner(struct kref *kref);
>> extern void nfsd4_probe_callback(struct nfs4_client *clp);
>> --
>> 1.5.1
>>
>>
>>
>
--
Benny Halevy
Panasas Inc.
Accelerating Time to Results(TM) with Clustered Storage
www.panasas.com
bhalevy at panasas.com
Tel/Fax: +972-3-647-8340
Mobile: +972-54-802-8340
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-nfs4_client-ref-count-for-cb_recall.patch
Url: http://linux-nfs.org/pipermail/pnfs/attachments/20070514/a3bffdb6/attachment.diff
More information about the pNFS
mailing list