[pnfs] [PATCH 11/14] pnfs: pnfs_write_begin
Dean Hildebrand
seattleplus at gmail.com
Wed Apr 2 12:58:39 EDT 2008
Fredric Isaman wrote:
>
>
> On Mon, 31 Mar 2008, Dean Hildebrand wrote:
>
>>
>>>>>
>>>>> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
>>>>> index 1844838..f711df2 100644
>>>>> --- a/include/linux/nfs4_pnfs.h
>>>>> +++ b/include/linux/nfs4_pnfs.h
>>>>> @@ -48,6 +48,12 @@ struct pnfs_layout_type {
>>>>> u8 ld_data[]; /* layout driver private data */
>>>>> };
>>>>> +struct pnfs_fsdata {
>>>>> + struct layoutdriver_io_operations *ld_io_ops;
>>>>> + u8 ld_fsdata[];
>>>>> +};
>>>>>
>>>> I'm searching through the block and pnfs code and I can't seem to
>>>> find any uses of either of ld_fsdata or ld_io_ops. What is the
>>>> meaning of the internals of this structure? All I see in the block
>>>> code is pointer comparisons of the fsdata stuff.
>>>
>>> The reason for the pnfs_fsdata structure is that when free() needs
>>> to be called, there is no longer any reference to the layout driver,
>>> so the only way to know which free code to call is to store some
>>> sort of pointer to it in the data structure itself. See
>>> pnfs_free_fsdata(). Note that there is a similar problem with
>>> wb_private, thus the need to set wb_ops for use in
>>> pnfs_free_request_data().
>> Your justification for having variables in pnfs_fsdata is dubious for
>> multiple reasons:
>> 1) nfs_write_end calls pnfs_write_end_cleanup calls pnfs_free_fsdata
>> which uses pnfs_fsdata.ld_io_iops. If you simply pass the nfs_server
>> into pnfs_write_end_clean up from nfs_write_end then ld_io_ops is not
>> required.
>> ------> If ld_io_ops is not required, and ld_fsdata is never used,
>> then struct pnfs_fsdata does not require any internal variables.
>> 2) No layout driver implements free_fsdata, so it is currently not
>> required anyways.
>>
>> Therefore, if a struct doesn't require internal arguments, it can be
>> replaced with a single bit or a boolean (as you agree below)
>>
>> Side note: wb_private is disappearing as per our conf call. It is no
>> longer needed once flush_one and writepagelist are merged.
>>>
>>>> Is the only purpose of creating and using pnfs_fsdata just to
>>>> determine if pNFS is being used?
>>>
>>> Not quite, it's purpose is to remember if nfs_write_begin properly
>>> set up the page for use by the block driver.
>> So this is a boolean?
>>>
>>>> That seems to be the only purpose in bl_do flush. It seems file
>>>> layout (nfs_page.wb_private) and block (fsdata) are competing for
>>>> ways to store layout driver specific information. We will be
>>>> removing the need for wb_private in the file layout driver once we
>>>> merge flush_one and write_pagelist.
>>>>
>>>> For block, is the goal to remember, for each page, whether it
>>>> should use pnfs or not? If so, would it be easier to use the
>>>> wb_private field (ie use nfs_find_page_request in nfs_write_begin,
>>>> pass the nfs_page structure to pnfs_write_begin, and then set a
>>>> flag in the nfs_page)? In fact, couldn't we just use a flag in the
>>>> nfs_page?
>>>>
>>>
>>> No. The problem is that none of these structures exist yet. The
>>> nfs_page request is not created until nfs_write_end, and at that
>>> time we need to know, based on information from nfs_write_begin
>>> (passed through fsdata), what flags/fields to set in nfs_page. Note
>>> that this communication between write_begin and write_end is exactly
>>> what fsdata is there for.
>> Ahh I see, on the existing pages that need to be flushed have a
>> nfs_page. Ok, so we are back to setting a non-dynamic memory boolean
>> on fsdata. No free memory cleanup required.
>>>
>>>> I'd like to figure out a way to remove pnfs_fsdata and its
>>>> associated messiness (just like we will remove wb_private.)
>>>
>>> Admittedly, since I am currently just passing a single bit of data,
>>> I could possibly just grab a page flag. However, I think the fsdata
>>> approach is more general, and follows naturally from the current
>>> code design.
>> If all you need to do is set a single bit of data (using fsdata) then
>> do that instead and remove free_fsdata. I'd prefer not to add
>> interface functions that predict how others might used random bits of
>> code. At this point, we have a write_begin and a write_end, which
>> according to yourself, is good enough for all known layout drivers.
>> Let's let the next layout driver try and make the arguments for
>> adding dynamic data and extra interface ops.
>> Dean
>
> I see your point, and won't push too hard, other than to say that it
> feels very wrong to make available use of void *, without a
> corresponding way to free it. At the very least, extensive comments
> warning of the omission should be added.
Maybe I still don't understand the point of fsdata, but if the point of
fsdata is to transfer info from write_begin to write_end, it should
allocated in write_begin and freed in write_end.
Dean
>
> Fred
>>>
>>>>> +
>>>>> +
>>>>> static inline struct inode *
>>>>> PNFS_INODE(struct pnfs_layout_type *lo)
>>>>> {
>>>>> @@ -133,8 +139,11 @@ struct layoutdriver_io_operations {
>>>>> unsigned nr_pages, loff_t offset, size_t count,
>>>>> int sync, struct nfs_write_data *nfs_data);
>>>>> int (*flush_one) (struct pnfs_layout_segment *, struct
>>>>> list_head *head, unsigned int npages, size_t count, int how);
>>>>> + int (*write_begin) (struct pnfs_layout_segment *lseg, struct
>>>>> page *page,
>>>>> + loff_t pos, unsigned count,
>>>>> + struct pnfs_fsdata **fsdata);
>>>>> void (*free_request_data) (struct nfs_page *);
>>>>> -
>>>>> + void (*free_fsdata) (struct pnfs_fsdata *data);
>>>>> /* Consistency ops */
>>>>> /* 2 problems:
>>>>> @@ -173,6 +182,10 @@ struct layoutdriver_policy_operations {
>>>>> /* test for nfs page cache coalescing */
>>>>> int (*pg_test)(struct nfs_pageio_descriptor *, struct
>>>>> nfs_page *, struct nfs_page *);
>>>>> + /* Test for pre-write request flushing */
>>>>> + int (*do_flush)(struct pnfs_layout_segment *lseg, struct
>>>>> nfs_page *req,
>>>>> + struct pnfs_fsdata *fsdata);
>>>>> +
>>>>> /* Retreive the block size of the file system. If
>>>>> gather_across_stripes == 1,
>>>>> * then the file system will gather requests into the block
>>>>> size.
>>>>> * TODO: Where will the layout driver get this info? It is
>>>>> hard coded in PVFS2.
>>>>>
>>
More information about the pNFS
mailing list