[pnfs] nfsd layout cache design for layoutrecall

Benny Halevy bhalevy at panasas.com
Fri May 4 03:04:54 EDT 2007


J. Bruce Fields wrote:
> Random remarks:
> 
> On Thu, May 03, 2007 at 11:22:59PM +0300, Benny Halevy wrote:
>>> * changed cl_count in nfs4_client into a kref.
>>>   nfs4_{get,put}_client take and release a reference on the
>>>   client structure under the state lock.
>>>
>>> * layoutrecall manipulation is also done under the state lock,
>>>   not under the recall lock.  Special care taken when sending
>>>   the recalls to avoid distributed deadlock - client sending
>>>   layoutreturns while serving a cb_layoutrecall, described
>>>   in more details below...
> 
> We've never been totally happy with the way everything in nfsd4 is done
> under the big state lock.  Not that it's up to you to fix that, but it's
> just something to keep in mind....

My impression is that the state_lock mutex is an overkill in many cases
that you'd be perfectly covered by a lightweight spinlock.
In the layout cache case there's no sense in using a separate
recall lock since you want both the client and file layouts
and layoutrecalls lists locked at the same time.

With a reference count on the client and file we may want to
consider taking the state_lock and when creating and destroying
nfs4_{client,file} as well hashing them in the globally, while
protecting pnfs state with a separate spinlock.
However, this will require taking twice as many locks in the path.
I'm worried less about contention when not blocking under the state_lock
but rather about the bus locking operations overhead.

> 
>>> * struct nfs4_file:
>>> added fsid and filehandle to the structure.
>>> these are accessed in the get, return, and recall paths
>>> through a pointer to the nfs4_file structure
>>> rather than copying the data into each nfs4_layout and nfs4_layoutrecall
>>> instance.
>>>
>>> +#ifdef CONFIG_PNFS
>>> +	/* used by layoutget / layoutrecall */
>>> +	struct nfs4_fsid	fi_fsid;
>>> +	u32			fi_fhlen;
>>> +	u32			fi_fhval[NFS4_FHSIZE];
> 
> NFS4_FHSIZE is in bytes, isn't it?  (Oops, I think we made that mistake
> in the v4.0 code too.)

Yeah, copy/paste :-/

would you rather define it as u8 fh_fhval[NFS4_FHSIZE] or
u32 fh_fhval[NFS4_FHSIZE/sizeof(u32)]? (same for cbr_fhval)

Thanks for catching this as I realized I didn't initialize these
fields when allocating nfs4_file.

+find_alloc_file(struct inode *ino)
+find_alloc_file(struct inode *ino, struct svc_fh *current_fh)

-alloc_init_file(struct inode *ino)
+alloc_init_file(struct inode *ino, struct svc_fh *current_fh)

@@ -1397,6 +1416,14 @@ alloc_init_file(struct inode *ino)
+#ifdef CONFIG_PNFS
+		fp->fi_fsid.major = current_fh->fh_export->ex_fsid;
+		fp->fi_fsid.minor = 0;
+		fp->fi_fhlen = current_fh->fh_handle.fh_size;
+		BUG_ON(fp->fi_fhlen > sizeof(fp->fi_fhval));
+		memcpy(fp->fi_fhval, &current_fh->fh_handle.fh_base,
+		       fp->fi_fhlen);
+#endif /* CONFIG_PNFS */


> 
>>> * nfs4_layoutrecall
>>>   tracks outstanding layoutrecalls
>>>   listed only per client
>>>   but keeps a reference both on the respective client and file
>>>   (if applicable, can be NULL in the RECALL_ALL case)
>>>   clr_flags field used for marking the layoutrecall done and deciding
>>>   who's freeing it.
> 
> Is there a chance it would be possible to use reference counting for that?

Sure, refcounting does seem simpler. BTW, it's not strictly needed, as
the flags are manipulated under the state_lock.




More information about the pNFS mailing list