[pnfs] [PATCH 11/14] pnfs: pnfs_write_begin

Fredric Isaman iisaman at citi.umich.edu
Wed Apr 2 09:35:48 EDT 2008



On Mon, 31 Mar 2008, Dean Hildebrand wrote:

>
>>>> 
>>>> diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
>>>> index 1844838..f711df2 100644
>>>> --- a/include/linux/nfs4_pnfs.h
>>>> +++ b/include/linux/nfs4_pnfs.h
>>>> @@ -48,6 +48,12 @@ struct pnfs_layout_type {
>>>>      u8 ld_data[];            /* layout driver private data */
>>>>  };
>>>>  +struct pnfs_fsdata {
>>>> +    struct layoutdriver_io_operations *ld_io_ops;
>>>> +    u8 ld_fsdata[];
>>>> +};
>>>> 
>>> I'm searching through the block and pnfs code and I can't seem to find any 
>>> uses of either of ld_fsdata or ld_io_ops.  What is the meaning of the 
>>> internals of this structure?  All I see in the block code is pointer 
>>> comparisons of the fsdata stuff.
>> 
>> The reason for the pnfs_fsdata structure is that when free() needs to be 
>> called, there is no longer any reference to the layout driver, so the only 
>> way to know which free code to call is to store some sort of pointer to it 
>> in the data structure itself.  See pnfs_free_fsdata().  Note that there is 
>> a similar problem with wb_private, thus the need to set wb_ops for use in 
>> pnfs_free_request_data().
> Your justification for having variables in pnfs_fsdata is dubious for 
> multiple reasons:
> 1) nfs_write_end calls pnfs_write_end_cleanup calls pnfs_free_fsdata which 
> uses pnfs_fsdata.ld_io_iops.  If you simply pass the nfs_server into 
> pnfs_write_end_clean up from nfs_write_end then ld_io_ops is not required.
> ------> If ld_io_ops is not required, and ld_fsdata is never used, then 
> struct pnfs_fsdata does not require any internal variables.
> 2) No layout driver implements free_fsdata, so it is currently not required 
> anyways.
>
> Therefore, if a struct doesn't require internal arguments, it can be replaced 
> with a single bit or a boolean (as you agree below)
>
> Side note: wb_private is disappearing as per our conf call.  It is no longer 
> needed once flush_one and writepagelist are merged.
>> 
>>> Is the only purpose of creating and using pnfs_fsdata just to determine if 
>>> pNFS is being used?
>> 
>> Not quite, it's purpose is to remember if nfs_write_begin properly set up 
>> the page for use by the block driver.
> So this is a boolean?
>> 
>>> That seems to be the only purpose in bl_do flush. It seems file layout 
>>> (nfs_page.wb_private) and block (fsdata) are competing for ways to store 
>>> layout driver specific information.  We will be removing the need for 
>>> wb_private in the file layout driver once we merge flush_one and 
>>> write_pagelist.
>>> 
>>> For block, is the goal to remember, for each page, whether it should use 
>>> pnfs or not?  If so, would it be easier to use the wb_private field (ie 
>>> use nfs_find_page_request in nfs_write_begin, pass the nfs_page structure 
>>> to pnfs_write_begin, and then set a flag in the nfs_page)?  In fact, 
>>> couldn't we just use a flag in the nfs_page?
>>> 
>> 
>> No.  The problem is that none of these structures exist yet.  The nfs_page 
>> request is not created until nfs_write_end, and at that time we need to 
>> know, based on information from nfs_write_begin (passed through fsdata), 
>> what flags/fields to set in nfs_page.  Note that this communication between 
>> write_begin and write_end is exactly what fsdata is there for.
> Ahh I see, on the existing pages that need to be flushed have a nfs_page. 
> Ok, so we are back to setting a non-dynamic memory boolean on fsdata.  No 
> free memory cleanup required.
>> 
>>> I'd like to figure out a way to remove pnfs_fsdata and its associated 
>>> messiness (just like we will remove wb_private.)
>> 
>> Admittedly, since I am currently just passing a single bit of data, I could 
>> possibly just grab a page flag.  However, I think the fsdata approach is 
>> more general, and follows naturally from the current code design.
> If all you need to do is set a single bit of data (using fsdata) then do that 
> instead and remove free_fsdata.  I'd prefer not to add interface functions 
> that predict how others might used random bits of code.  At this point, we 
> have a write_begin and a write_end, which according to yourself, is good 
> enough for all known layout drivers.  Let's let the next layout driver try 
> and make the arguments for adding dynamic data and extra interface ops.
> Dean

I see your point, and won't push too hard, other than to say that it feels 
very wrong to make available use of void *, without a corresponding way to 
free it.  At the very least, extensive comments warning of the omission 
should be added.

 	Fred
>> 
>>>> +
>>>> +
>>>>  static inline struct inode *
>>>>  PNFS_INODE(struct pnfs_layout_type *lo)
>>>>  {
>>>> @@ -133,8 +139,11 @@ struct layoutdriver_io_operations {
>>>>                     unsigned nr_pages, loff_t offset, size_t count,
>>>>                     int sync, struct nfs_write_data *nfs_data);
>>>>      int (*flush_one) (struct pnfs_layout_segment *, struct list_head 
>>>> *head, unsigned int npages, size_t count, int how);
>>>> +    int (*write_begin) (struct pnfs_layout_segment *lseg, struct page 
>>>> *page,
>>>> +                loff_t pos, unsigned count,
>>>> +                struct pnfs_fsdata **fsdata);
>>>>      void (*free_request_data) (struct nfs_page *);
>>>> -
>>>> +    void (*free_fsdata) (struct pnfs_fsdata *data);
>>>>       /* Consistency ops */
>>>>      /* 2 problems:
>>>> @@ -173,6 +182,10 @@ struct layoutdriver_policy_operations {
>>>>      /* test for nfs page cache coalescing */
>>>>      int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *, 
>>>> struct nfs_page *);
>>>>  +    /* Test for pre-write request flushing */
>>>> +    int (*do_flush)(struct pnfs_layout_segment *lseg, struct nfs_page 
>>>> *req,
>>>> +            struct pnfs_fsdata *fsdata);
>>>> +
>>>>      /* Retreive the block size of the file system.  If 
>>>> gather_across_stripes == 1,
>>>>       * then the file system will gather requests into the block size.
>>>>       * TODO: Where will the layout driver get this info?  It is hard 
>>>> coded in PVFS2.
>>>> 
>


More information about the pNFS mailing list