[pnfs] [PATCH 4/12] use kref for nfsd nfs4_client

Benny Halevy bhalevy at panasas.com
Mon May 14 09:49:05 EDT 2007


Ok. Here is is with taking a reference on cl_users.
It's simpler but requires exporting rpc_release_client().
I tested getting and recalling a delegation as you recommended
and I also ran it through the connectathon tests, so as far as
I'm concerned I think it's ready to go.

Benny

William A. (Andy) Adamson wrote:
> On 5/11/07, Benny Halevy <bhalevy at panasas.com> wrote:
>> William A. (Andy) Adamson wrote:
>>> On 5/9/07, Benny Halevy <bhalevy at panasas.com> wrote:
>>>> J. Bruce Fields wrote:
>>>>> On Tue, May 08, 2007 at 10:26:12PM +0300, Benny Halevy wrote:
>>>>>
>>>>>> So the first step can be justified in mainline as in
>>>> nfsd4_probe_callback's
>>>>>> out_rpciod error case atomic_dec(&clp->cl_count) won't free up the
>>>> client
>>>>>> structure if it was expired concurrently.  expire_client shuts down
>> the
>>>>>> callback rpc path so that may wait for nfsd4_probe_callback to finish
>>>>>> but I'm not sure.  At any rate, calling put_nfs4_client() there is
>>>>>> more robust.
>>>>>>
>>>>> Yes, the destruction of the rpc_client shouldn't succeed until any
>>>>> outstanding tasks are done.  But I agree that the reference counting
>>>>> there is at best odd.
>>>>>
>>>>> Actually, the caller (setclientid_confirm) also works with this client
>>>>> without a reference; if the client could be expired at any time, then
>>>>> that would already be a problem for it.  So I suppose it's just
>>>>> depending on the state lock to prevent the client expiration.  We're
>>>>> taking the reference in nfsd4_probe_callback() because we know it may
>> be
>>>>> held after the state lock is dropped.
>>>>>
>>>> I think that the synchronization between expire_client and
>> nfsd4_cb_recall
>>>> is broken if the callback is done not under the state_lock, even if the
>>>> caller of the callback has a reference on the client since the rpc_clnt
>> in
>>>> clp->cl_callback.cb_client can go away before the callback rpc starts.
>>>
>>> good point. bump both the rpc_clnt->cl_count as well as the
>>> nfs4_client->cl_count in nfsd_break_deleg_cb() prior to the kthread_run
>> and
>>> then call put_nfs4_client() and rpc_destroy_client() at the end of
>>> nfs4_cb_recall().
>> <snip>
>>
>> I'm afraid that rpc_destroy_client() is not the right api to call.
>> Ideally, the callback process would take a ref on clnt->cl_users and call
>> rpc_release_client when the call is done.  This will synchronize correctly
>> with rpc_shutdown_client *but* rpc_release_client is not exported by
>> the sunrpc module.  Since the synchronization with outstanding callbacks
>> is supposed to happen anyway, shutting down the callback rpc client
>> only when the client's refcount goes to zero should work as well.
> 
> 
> perhaps exporting rpc_release_client is  the way to go.
> 
> The attached patch is against mainline nfs-2.6, if you're good with it
>> I can send it also to linux-nfs for review.  It's passing connectathon
>> tests
>> plus going through expire_client via the nfs4 laundromat, but I haven't
>> tested cb_recall.
> 
> 
> You need to test handingout a delegation, and then have a local non-nfsd
> process cause a break_lease.
> 
> e.g. turn on read delegations, open a file read-only from the NFSv4 client
> (twice - the first open will not geta delegation due to no open_confirm),
> then open the same file read-write from a non-nfsd local process which will
> case a break_lease, and cause the a recall of the read delegation.
> 
> 
> Benny
>>
>> From 0647b1f0e7cb99b6c0d5f65fd4ace4a359aae206 Mon Sep 17 00:00:00 2001
>> From: Benny Halevy <bhalevy at ns1.bhalevy.com>
>> Date: Fri, 11 May 2007 12:39:16 +0300
>> Subject: [PATCH] nfs4_client ref count for cb_recall
>>
>> This patch changes the client reference counting technique
>> to struct kref, moves shutting down of the callback rpc_client
>> from expire_client time to free_nfs4_client time that's called
>> when the nfs4_client refcount goes to zero, adds a refcount
>> to the nfs4_client around nfs4_cb_recall to keep the rcp_client
>> around throughout the recall (otherwise it could have been used
>> after free), and finally it adds some BUG_ONs to assert that the
>> state lock is taken when taking or releasing reference for nfs4_file,
>> nfs4_client, and nfs4_delegation to ensure atomicity.
>> ---
>> fs/nfsd/nfs4callback.c     |   11 +++++-
>> fs/nfsd/nfs4state.c        |   80
>> ++++++++++++++++++++++++++++++++-----------
>> include/linux/nfsd/state.h |    3 +-
>> 3 files changed, 70 insertions(+), 24 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>> index 32ffea0..ae60a7d 100644
>> --- a/fs/nfsd/nfs4callback.c
>> +++ b/fs/nfsd/nfs4callback.c
>> @@ -433,7 +433,7 @@ nfsd4_probe_callback(struct nfs4_client *clp)
>>                 goto out_clnt;
>>
>>         /* the task holds a reference to the nfs4_client struct */
>> -       atomic_inc(&clp->cl_count);
>> +       get_nfs4_client(clp);
>>
>>         msg.rpc_cred = nfsd4_lookupcred(clp,0);
>>         if (IS_ERR(msg.rpc_cred))
>> @@ -448,7 +448,9 @@ nfsd4_probe_callback(struct nfs4_client *clp)
>>         return;
>>
>> out_rpciod:
>> -       atomic_dec(&clp->cl_count);
>> +       nfs4_lock_state();
>> +       put_nfs4_client(clp);
>> +       nfs4_unlock_state();
>>         rpciod_down();
>> out_clnt:
>>         rpc_shutdown_client(cb->cb_client);
>> @@ -475,7 +477,9 @@ nfs4_cb_null(struct rpc_task *task, void *dummy)
>>         atomic_set(&cb->cb_set, 1);
>>         dprintk("NFSD: callback set to client %u.%u.%u.%u\n",
>> NIPQUAD(addr));
>> out:
>> +       nfs4_lock_state();
>>         put_nfs4_client(clp);
>> +       nfs4_unlock_state();
>> }
>>
>> static const struct rpc_call_ops nfs4_cb_null_ops = {
>> @@ -533,6 +537,9 @@ out:
>>         /* Success or failure, now we're either waiting for lease
>> expiration
>>          * or deleg_return. */
>>         dprintk("NFSD: nfs4_cb_recall: dp %p dl_flock %p dl_count
>> %d\n",dp, dp->dl_flock, atomic_read(&dp->dl_count));
>> +       nfs4_lock_state();
>> +       put_nfs4_client(dp->dl_client);
>>         nfs4_put_delegation(dp);
>> +       nfs4_unlock_state();
>>         return;
>> }
> 
> 
> nfs4_cb_recall can not call nfs4_lock_state() because nfs4_cb_recall can be
> called by a non-NFSD local process  on the NFSv4 server machine (e.g. a
> Samba server)  that causes a break_lease on a file with a delegation handed
> out by the NFSv4 server. Such a process has no reference to the
> client_mutex.
> 
> The same is true with the BUG_ON_UNLOCKED_STATE call you added to
> nfs4_put_delegation which is called by nfs4_cb_recall.
> 
> The same is true with calling nfs4_state_lock in nfsd_break_deleg_cb().
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 3cc8ce4..2a2eb97 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -90,6 +90,8 @@ static struct kmem_cache *file_slab = NULL;
>> static struct kmem_cache *stateid_slab = NULL;
>> static struct kmem_cache *deleg_slab = NULL;
>>
>> +#define BUG_ON_UNLOCKED_STATE() BUG_ON(mutex_trylock(&client_mutex))
>> +
>> void
>> nfs4_lock_state(void)
>> {
>> @@ -139,6 +141,7 @@ free_nfs4_file(struct kref *kref)
>> static inline void
>> put_nfs4_file(struct nfs4_file *fi)
>> {
>> +       BUG_ON_UNLOCKED_STATE();
>>         kref_put(&fi->fi_ref, free_nfs4_file);
>> }
>>
>> @@ -228,6 +231,7 @@ alloc_init_deleg(struct nfs4_client *clp, struct
>> nfs4_stateid *stp, struct svc_f
>> void
>> nfs4_put_delegation(struct nfs4_delegation *dp)
>> {
>> +       BUG_ON_UNLOCKED_STATE();
>>         if (atomic_dec_and_test(&dp->dl_count)) {
>>                 dprintk("NFSD: freeing dp %p\n",dp);
>>                 put_nfs4_file(dp->dl_file);
>> @@ -353,20 +357,11 @@ alloc_client(struct xdr_netobj name)
>>         return clp;
>> }
>>
> 
> Can't add BUG_ON_UNLOCKED_STATE in nfs4_put_delegation - see above.
> 
> -static inline void
>> -free_client(struct nfs4_client *clp)
>> -{
>> -       if (clp->cl_cred.cr_group_info)
>> -               put_group_info(clp->cl_cred.cr_group_info);
>> -       kfree(clp->cl_name.data);
>> -       kfree(clp);
>> -}
>> -
>> -void
>> -put_nfs4_client(struct nfs4_client *clp)
>> +struct nfs4_client *
>> +get_nfs4_client(struct nfs4_client *clp)
>> {
>> -       if (atomic_dec_and_test(&clp->cl_count))
>> -               free_client(clp);
>> +       kref_get(&clp->cl_ref);
>> +       return clp;
>> }
>>
>> static void
>> @@ -383,17 +378,43 @@ shutdown_callback_client(struct nfs4_client *clp)
>> }
>>
>> static void
>> +free_nfs4_client(struct kref *kref)
>> +{
>> +       struct nfs4_client *clp = container_of(kref, struct nfs4_client,
>> cl_ref);
>> +       BUG_ON(!list_empty(&clp->cl_idhash));
>> +       BUG_ON(!list_empty(&clp->cl_strhash));
>> +       BUG_ON(!list_empty(&clp->cl_lru));
>> +       BUG_ON(!list_empty(&clp->cl_delegations));
>> +       BUG_ON(!list_empty(&clp->cl_openowners));
>> +       shutdown_callback_client(clp);
>> +       if (clp->cl_cred.cr_group_info)
>> +               put_group_info(clp->cl_cred.cr_group_info);
>> +       kfree(clp->cl_name.data);
>> +       kfree(clp);
>> +}
>> +
>> +void
>> +put_nfs4_client(struct nfs4_client *clp)
>> +{
>> +       BUG_ON_UNLOCKED_STATE();
>> +       if (clp)
>> +               kref_put(&clp->cl_ref, free_nfs4_client);
>> +}
>> +
>> +static void
>> expire_client(struct nfs4_client *clp)
>> {
>>         struct nfs4_stateowner *sop;
>>         struct nfs4_delegation *dp;
>>         struct list_head reaplist;
>>
>> -       dprintk("NFSD: expire_client cl_count %d\n",
>> -                           atomic_read(&clp->cl_count));
>> -
>> -       shutdown_callback_client(clp);
>> +       dprintk("NFSD: expire_client refcount %d\n",
>> +                           atomic_read(&clp->cl_ref.refcount));
>>
>> +       BUG_ON_UNLOCKED_STATE();
>> +       atomic_set(&clp->cl_callback.cb_set, 0);
>> +       if (clp->cl_callback.cb_client != NULL)
>> +               rpc_killall_tasks(clp->cl_callback.cb_client);
> 
> 
> Since cb_client is only set to NULL in  shutdown_callback_client which is
> only called in free_nfs4_client, why the NULL check?
> 
> does rpc_killall_tasks() race with rpc_release_client() called in
> rpc_put_task? I'm looking at the following comment in
> clnt.c:rpc_shutdown_client.
> 
>                 /* Don't let rpc_release_client destroy us */
>                 clnt->cl_oneshot = 0;
>                 clnt->cl_dead = 0;
>                 rpc_killall_tasks(clnt);
> 
> 
>         INIT_LIST_HEAD(&reaplist);
>>         spin_lock(&recall_lock);
>>         while (!list_empty(&clp->cl_delegations)) {
>> @@ -409,9 +430,9 @@ expire_client(struct nfs4_client *clp)
>>                 list_del_init(&dp->dl_recall_lru);
>>                 unhash_delegation(dp);
>>         }
>> -       list_del(&clp->cl_idhash);
>> -       list_del(&clp->cl_strhash);
>> -       list_del(&clp->cl_lru);
>> +       list_del_init(&clp->cl_idhash);
>> +       list_del_init(&clp->cl_strhash);
>> +       list_del_init(&clp->cl_lru);
>>         while (!list_empty(&clp->cl_openowners)) {
>>                 sop = list_entry(clp->cl_openowners.next, struct
>> nfs4_stateowner, so_perclient);
>>                 release_stateowner(sop);
>> @@ -426,7 +447,7 @@ create_client(struct xdr_netobj name, char *recdir) {
>>         if (!(clp = alloc_client(name)))
>>                 goto out;
>>         memcpy(clp->cl_recdir, recdir, HEXDIR_LEN);
>> -       atomic_set(&clp->cl_count, 1);
>> +       kref_init(&clp->cl_ref);
>>         atomic_set(&clp->cl_callback.cb_set, 0);
>>         INIT_LIST_HEAD(&clp->cl_idhash);
>>         INIT_LIST_HEAD(&clp->cl_strhash);
>> @@ -1213,6 +1234,7 @@ find_openstateowner_str(unsigned int hashval, struct
>> nfsd4_open *open)
>> {
>>         struct nfs4_stateowner *so = NULL;
>>
>> +       BUG_ON_UNLOCKED_STATE();
>>         list_for_each_entry(so, &ownerstr_hashtbl[hashval], so_strhash) {
>>                 if (cmp_owner_str(so, &open->op_owner,
>> &open->op_clientid))
>>                         return so;
>> @@ -1227,6 +1249,7 @@ find_file(struct inode *ino)
>>         unsigned int hashval = file_hashval(ino);
>>         struct nfs4_file *fp;
>>
>> +       BUG_ON_UNLOCKED_STATE();
>>         list_for_each_entry(fp, &file_hashtbl[hashval], fi_hash) {
>>                 if (fp->fi_inode == ino) {
>>                         get_nfs4_file(fp);
>> @@ -1342,12 +1365,24 @@ static
>> void nfsd_break_deleg_cb(struct file_lock *fl)
>> {
>>         struct nfs4_delegation *dp=  (struct nfs4_delegation
>> *)fl->fl_owner;
>> +       struct nfs4_client *clp;
>> +       struct rpc_clnt *clnt;
>>         struct task_struct *t;
>>
>>         dprintk("NFSD nfsd_break_deleg_cb: dp %p fl %p\n",dp,fl);
>>         if (!dp)
>>                 return;
>>
>> +       nfs4_lock_state();
>> +       clp = dp->dl_client;
>> +       clnt = clp->cl_callback.cb_client;
>> +       if (!atomic_read(&clp->cl_callback.cb_set) || !clnt) {
>> +               nfs4_lock_state();
> 
>                       ^^^^^^^^^
> shouldn't this  be nfs4_unlock_state?
> 
> +               return;
>> +       }
>> +       get_nfs4_client(clp);
>> +       nfs4_lock_state();
> 
>              ^^^^^^^^^^
> this too?
> 
> +
>>         /* We're assuming the state code never drops its reference
>>          * without first removing the lease.  Since we're in this lease
>>          * callback (and since the lease code is serialized by the kernel
>> @@ -1372,7 +1407,10 @@ void nfsd_break_deleg_cb(struct file_lock *fl)
>>                 printk(KERN_INFO "NFSD: Callback thread failed for "
>>                         "for client (clientid %08x/%08x)\n",
>>                         clp->cl_clientid.cl_boot, clp->cl_clientid.cl_id);
>> +               nfs4_lock_state();
>> +               put_nfs4_client(clp);
>>                 nfs4_put_delegation(dp);
>> +               nfs4_unlock_state();
>>         }
>> }
> 
> 
> Can't call nfs4_lock_state in nfsd_break_deleg_cb. see above.
> 
> diff --git a/include/linux/nfsd/state.h b/include/linux/nfsd/state.h
>> index ab5c236..84463a3 100644
>> --- a/include/linux/nfsd/state.h
>> +++ b/include/linux/nfsd/state.h
>> @@ -116,6 +116,7 @@ struct nfs4_callback {
>>   *       when we expire the nfs4_client
>>   */
>> struct nfs4_client {
>> +       struct kref             cl_ref;
>>         struct list_head        cl_idhash;      /* hash by cl_clientid.id
>> */
>>         struct list_head        cl_strhash;     /* hash by cl_name */
>>         struct list_head        cl_openowners;
>> @@ -130,7 +131,6 @@ struct nfs4_client {
>>         clientid_t              cl_clientid;    /* generated by server */
>>         nfs4_verifier           cl_confirm;     /* generated by server */
>>         struct nfs4_callback    cl_callback;    /* callback info */
>> -       atomic_t                cl_count;       /* ref count */
>>         u32                     cl_firststate;  /* recovery dir creation
>> */
>> };
>>
>> @@ -279,6 +279,7 @@ extern void nfs4_lock_state(void);
>> extern void nfs4_unlock_state(void);
>> extern int nfs4_in_grace(void);
>> extern __be32 nfs4_check_open_reclaim(clientid_t *clid);
>> +extern struct nfs4_client *get_nfs4_client(struct nfs4_client *clp);
>> extern void put_nfs4_client(struct nfs4_client *clp);
>> extern void nfs4_free_stateowner(struct kref *kref);
>> extern void nfsd4_probe_callback(struct nfs4_client *clp);
>> --
>> 1.5.1
>>
>>
>>
> 


-- 
Benny Halevy
Panasas Inc.
Accelerating Time to Results(TM) with Clustered Storage

www.panasas.com
bhalevy at panasas.com
Tel/Fax: +972-3-647-8340
Mobile: +972-54-802-8340
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-nfs4_client-ref-count-for-cb_recall.patch
Url: http://linux-nfs.org/pipermail/pnfs/attachments/20070514/a3bffdb6/attachment.diff 


More information about the pNFS mailing list