[pnfs] [PATCH 11/14] pnfs: pnfs_write_begin

Fredric Isaman iisaman at citi.umich.edu
Wed Apr 2 13:25:51 EDT 2008



On Wed, 2 Apr 2008, Dean Hildebrand wrote:

>
>
> Fredric Isaman wrote:
>> 
>> 
>> On Mon, 31 Mar 2008, Dean Hildebrand wrote:
>> 
>>> 
>>>>>> 
>>>>>> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
>>>>>> index 1844838..f711df2 100644
>>>>>> --- a/include/linux/nfs4_pnfs.h
>>>>>> +++ b/include/linux/nfs4_pnfs.h
>>>>>> @@ -48,6 +48,12 @@ struct pnfs_layout_type {
>>>>>>      u8 ld_data[];            /* layout driver private data */
>>>>>>  };
>>>>>>  +struct pnfs_fsdata {
>>>>>> +    struct layoutdriver_io_operations *ld_io_ops;
>>>>>> +    u8 ld_fsdata[];
>>>>>> +};
>>>>>> 
>>>>> I'm searching through the block and pnfs code and I can't seem to find 
>>>>> any uses of either of ld_fsdata or ld_io_ops.  What is the meaning of 
>>>>> the internals of this structure?  All I see in the block code is pointer 
>>>>> comparisons of the fsdata stuff.
>>>> 
>>>> The reason for the pnfs_fsdata structure is that when free() needs to be 
>>>> called, there is no longer any reference to the layout driver, so the 
>>>> only way to know which free code to call is to store some sort of pointer 
>>>> to it in the data structure itself.  See pnfs_free_fsdata().  Note that 
>>>> there is a similar problem with wb_private, thus the need to set wb_ops 
>>>> for use in pnfs_free_request_data().
>>> Your justification for having variables in pnfs_fsdata is dubious for 
>>> multiple reasons:
>>> 1) nfs_write_end calls pnfs_write_end_cleanup calls pnfs_free_fsdata which 
>>> uses pnfs_fsdata.ld_io_iops.  If you simply pass the nfs_server into 
>>> pnfs_write_end_clean up from nfs_write_end then ld_io_ops is not required.
>>> ------> If ld_io_ops is not required, and ld_fsdata is never used, then 
>>> struct pnfs_fsdata does not require any internal variables.
>>> 2) No layout driver implements free_fsdata, so it is currently not 
>>> required anyways.
>>> 
>>> Therefore, if a struct doesn't require internal arguments, it can be 
>>> replaced with a single bit or a boolean (as you agree below)
>>> 
>>> Side note: wb_private is disappearing as per our conf call.  It is no 
>>> longer needed once flush_one and writepagelist are merged.
>>>> 
>>>>> Is the only purpose of creating and using pnfs_fsdata just to determine 
>>>>> if pNFS is being used?
>>>> 
>>>> Not quite, it's purpose is to remember if nfs_write_begin properly set up 
>>>> the page for use by the block driver.
>>> So this is a boolean?
>>>> 
>>>>> That seems to be the only purpose in bl_do flush. It seems file layout 
>>>>> (nfs_page.wb_private) and block (fsdata) are competing for ways to store 
>>>>> layout driver specific information.  We will be removing the need for 
>>>>> wb_private in the file layout driver once we merge flush_one and 
>>>>> write_pagelist.
>>>>> 
>>>>> For block, is the goal to remember, for each page, whether it should use 
>>>>> pnfs or not?  If so, would it be easier to use the wb_private field (ie 
>>>>> use nfs_find_page_request in nfs_write_begin, pass the nfs_page 
>>>>> structure to pnfs_write_begin, and then set a flag in the nfs_page)?  In 
>>>>> fact, couldn't we just use a flag in the nfs_page?
>>>>> 
>>>> 
>>>> No.  The problem is that none of these structures exist yet.  The 
>>>> nfs_page request is not created until nfs_write_end, and at that time we 
>>>> need to know, based on information from nfs_write_begin (passed through 
>>>> fsdata), what flags/fields to set in nfs_page.  Note that this 
>>>> communication between write_begin and write_end is exactly what fsdata is 
>>>> there for.
>>> Ahh I see, on the existing pages that need to be flushed have a nfs_page. 
>>> Ok, so we are back to setting a non-dynamic memory boolean on fsdata.  No 
>>> free memory cleanup required.
>>>> 
>>>>> I'd like to figure out a way to remove pnfs_fsdata and its associated 
>>>>> messiness (just like we will remove wb_private.)
>>>> 
>>>> Admittedly, since I am currently just passing a single bit of data, I 
>>>> could possibly just grab a page flag.  However, I think the fsdata 
>>>> approach is more general, and follows naturally from the current code 
>>>> design.
>>> If all you need to do is set a single bit of data (using fsdata) then do 
>>> that instead and remove free_fsdata.  I'd prefer not to add interface 
>>> functions that predict how others might used random bits of code.  At this 
>>> point, we have a write_begin and a write_end, which according to yourself, 
>>> is good enough for all known layout drivers.  Let's let the next layout 
>>> driver try and make the arguments for adding dynamic data and extra 
>>> interface ops.
>>> Dean
>> 
>> I see your point, and won't push too hard, other than to say that it feels 
>> very wrong to make available use of void *, without a corresponding way to 
>> free it.  At the very least, extensive comments warning of the omission 
>> should be added.
> Maybe I still don't understand the point of fsdata, but if the point of 
> fsdata is to transfer info from write_begin to write_end, it should allocated 
> in write_begin and freed in write_end.
> Dean


Which is precisely what pnfs_write_begin and pnfs_write_end_cleanup are 
there to do.

 	Fred



More information about the pNFS mailing list