[pnfs] Layoutcommit and stable flush

Garth Goodson Garth.Goodson at netapp.com
Mon Aug 7 15:45:22 EDT 2006



Dean Hildebrand wrote:
> 
> 
> Garth Goodson wrote:
>>
>>
>> Dean Hildebrand wrote:
>>> Hi All,
>>>
>>> I found an issue with layoutcommit when writing larger files. (I 
>>> notice it with files > 2GB, but I have 2 GB of RAM)
>>>
>>> Layoutcommit is currently called in nfs_sync_inode_wait.  Currently 
>>> we are ignoring the 'how' parameter, and always issueing a 
>>> layoutcommit whenever this function is invoked.  This can lead to an 
>>> excessive number of layoutcommit calls, since nfs_sync_inode_wait is 
>>> called for a variety of reasons to synchronously update the inode.
>>>
>>> The issue I'm seeing is that the VM calls this function under memory 
>>> pressure to flush and commit writes to disk (so it can release the 
>>> memory for those write requests).  When this is done, how == 
>>> FLUSH_STABLE.  For a 4 GB file, I see this function being invoked 5 - 
>>> 15 times, creating 5-15 layoutcommits just to write one file.  This 
>>> can cause a big slowdown.  From talking with Trond, our initial 
>>> inclination is to NOT issue a layoutcommit in this situation.  What 
>>> does everyone think?
>>>
>>> The larger issue of course is to determine which values of 'how', or 
>>> in which situations should a layoutcommit be issued?  This might be a 
>>> protocol issue that I should mention on the nfsv4 email list (let me 
>>> know if you think so).
>>>
>>> A cursory examination reveals that nfs_sync_inode_wait is called in 
>>> the following situations with the following value of 'how':
>>> 1. Lock/Unlock (how==0)
>>> 2. Getattr to update the mtime/ctime (how == nocommit)
>>> 3. Setattr (how==0)
>>> 4. Memory presssure (how == flush_stable)
>>> 5. Close (how==0)
>>> 6. Rename (how==0)
>>> 7. Delegation return (how==0)
>>> 8. Fsync (how==0)
>>> 9. A few other special cases
>>>
>>> *Note: how == 0 means to flush all data, sync the data, and update 
>>> the metadata.
>>>
>>> My initial view is that we always want to send a layoutcommit for all 
>>> values of 'how' other than FLUSH_STABLE.
>>>
>>> One interesting case is with getattr, as layoutcommit is required to 
>>> update the mtime/ctime, but the data is not committed to stable 
>>> storage.  Is it ok to issue a layoutcommit without first calling commit?
>>>
>>
>> I think we want commits to occur before issuing layoutcommit.  Why 
>> update the metadata while the data is not guaranteed stable?
>>
>> I'm not sure what the problem is with getattr.  I think it is fine for 
>> getattr to return client cached attrs that are not yet stable.  
>> However, if things like mtime, ctime are returned we must be sure that 
>> once the layoutcommit does occur that they are updated to at least 
>> those times (I think we can do this by passing them into layoutcommit).
> I'm off on vacation for the next week so I had enough time to look into 
> this thoroughly.  But modifying the file and then reading cached 
> attributes is definitely different than with standard NFSv4.  
> (nfs_getattr in inode.c)  Not sure how this gels with posix....
> 

This goes back to a discussion we had with Trond.  Trond's view is as 
long as it doesn't break open-close semantics, all is good.  Thus, new 
attrs only need to become visible at close time.  The client doing the 
I/O knows he is doing I/O and thus can figure out what lengths, times, 
to return.  If apps want better synchronization, then they must use locks.

-Garth

Dean
>>
>> -Garth
> 


More information about the pNFS mailing list