[pnfs] [PATCH 11/14] pnfs: pnfs_write_begin
Fredric Isaman
iisaman at citi.umich.edu
Wed Apr 2 09:35:48 EDT 2008
On Mon, 31 Mar 2008, Dean Hildebrand wrote:
>
>>>>
>>>> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
>>>> index 1844838..f711df2 100644
>>>> --- a/include/linux/nfs4_pnfs.h
>>>> +++ b/include/linux/nfs4_pnfs.h
>>>> @@ -48,6 +48,12 @@ struct pnfs_layout_type {
>>>> u8 ld_data[]; /* layout driver private data */
>>>> };
>>>> +struct pnfs_fsdata {
>>>> + struct layoutdriver_io_operations *ld_io_ops;
>>>> + u8 ld_fsdata[];
>>>> +};
>>>>
>>> I'm searching through the block and pnfs code and I can't seem to find any
>>> uses of either of ld_fsdata or ld_io_ops. What is the meaning of the
>>> internals of this structure? All I see in the block code is pointer
>>> comparisons of the fsdata stuff.
>>
>> The reason for the pnfs_fsdata structure is that when free() needs to be
>> called, there is no longer any reference to the layout driver, so the only
>> way to know which free code to call is to store some sort of pointer to it
>> in the data structure itself. See pnfs_free_fsdata(). Note that there is
>> a similar problem with wb_private, thus the need to set wb_ops for use in
>> pnfs_free_request_data().
> Your justification for having variables in pnfs_fsdata is dubious for
> multiple reasons:
> 1) nfs_write_end calls pnfs_write_end_cleanup calls pnfs_free_fsdata which
> uses pnfs_fsdata.ld_io_iops. If you simply pass the nfs_server into
> pnfs_write_end_clean up from nfs_write_end then ld_io_ops is not required.
> ------> If ld_io_ops is not required, and ld_fsdata is never used, then
> struct pnfs_fsdata does not require any internal variables.
> 2) No layout driver implements free_fsdata, so it is currently not required
> anyways.
>
> Therefore, if a struct doesn't require internal arguments, it can be replaced
> with a single bit or a boolean (as you agree below)
>
> Side note: wb_private is disappearing as per our conf call. It is no longer
> needed once flush_one and writepagelist are merged.
>>
>>> Is the only purpose of creating and using pnfs_fsdata just to determine if
>>> pNFS is being used?
>>
>> Not quite, it's purpose is to remember if nfs_write_begin properly set up
>> the page for use by the block driver.
> So this is a boolean?
>>
>>> That seems to be the only purpose in bl_do flush. It seems file layout
>>> (nfs_page.wb_private) and block (fsdata) are competing for ways to store
>>> layout driver specific information. We will be removing the need for
>>> wb_private in the file layout driver once we merge flush_one and
>>> write_pagelist.
>>>
>>> For block, is the goal to remember, for each page, whether it should use
>>> pnfs or not? If so, would it be easier to use the wb_private field (ie
>>> use nfs_find_page_request in nfs_write_begin, pass the nfs_page structure
>>> to pnfs_write_begin, and then set a flag in the nfs_page)? In fact,
>>> couldn't we just use a flag in the nfs_page?
>>>
>>
>> No. The problem is that none of these structures exist yet. The nfs_page
>> request is not created until nfs_write_end, and at that time we need to
>> know, based on information from nfs_write_begin (passed through fsdata),
>> what flags/fields to set in nfs_page. Note that this communication between
>> write_begin and write_end is exactly what fsdata is there for.
> Ahh I see, on the existing pages that need to be flushed have a nfs_page.
> Ok, so we are back to setting a non-dynamic memory boolean on fsdata. No
> free memory cleanup required.
>>
>>> I'd like to figure out a way to remove pnfs_fsdata and its associated
>>> messiness (just like we will remove wb_private.)
>>
>> Admittedly, since I am currently just passing a single bit of data, I could
>> possibly just grab a page flag. However, I think the fsdata approach is
>> more general, and follows naturally from the current code design.
> If all you need to do is set a single bit of data (using fsdata) then do that
> instead and remove free_fsdata. I'd prefer not to add interface functions
> that predict how others might used random bits of code. At this point, we
> have a write_begin and a write_end, which according to yourself, is good
> enough for all known layout drivers. Let's let the next layout driver try
> and make the arguments for adding dynamic data and extra interface ops.
> Dean
I see your point, and won't push too hard, other than to say that it feels
very wrong to make available use of void *, without a corresponding way to
free it. At the very least, extensive comments warning of the omission
should be added.
Fred
>>
>>>> +
>>>> +
>>>> static inline struct inode *
>>>> PNFS_INODE(struct pnfs_layout_type *lo)
>>>> {
>>>> @@ -133,8 +139,11 @@ struct layoutdriver_io_operations {
>>>> unsigned nr_pages, loff_t offset, size_t count,
>>>> int sync, struct nfs_write_data *nfs_data);
>>>> int (*flush_one) (struct pnfs_layout_segment *, struct list_head
>>>> *head, unsigned int npages, size_t count, int how);
>>>> + int (*write_begin) (struct pnfs_layout_segment *lseg, struct page
>>>> *page,
>>>> + loff_t pos, unsigned count,
>>>> + struct pnfs_fsdata **fsdata);
>>>> void (*free_request_data) (struct nfs_page *);
>>>> -
>>>> + void (*free_fsdata) (struct pnfs_fsdata *data);
>>>> /* Consistency ops */
>>>> /* 2 problems:
>>>> @@ -173,6 +182,10 @@ struct layoutdriver_policy_operations {
>>>> /* test for nfs page cache coalescing */
>>>> int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *,
>>>> struct nfs_page *);
>>>> + /* Test for pre-write request flushing */
>>>> + int (*do_flush)(struct pnfs_layout_segment *lseg, struct nfs_page
>>>> *req,
>>>> + struct pnfs_fsdata *fsdata);
>>>> +
>>>> /* Retreive the block size of the file system. If
>>>> gather_across_stripes == 1,
>>>> * then the file system will gather requests into the block size.
>>>> * TODO: Where will the layout driver get this info? It is hard
>>>> coded in PVFS2.
>>>>
>
More information about the pNFS
mailing list