[pnfs] Layoutcommit and stable flush

Dean Hildebrand dhildebz at eecs.umich.edu
Mon Aug 7 14:50:51 EDT 2006



Garth Goodson wrote:
>
>
> Dean Hildebrand wrote:
>> Hi All,
>>
>> I found an issue with layoutcommit when writing larger files. (I 
>> notice it with files > 2GB, but I have 2 GB of RAM)
>>
>> Layoutcommit is currently called in nfs_sync_inode_wait.  Currently 
>> we are ignoring the 'how' parameter, and always issueing a 
>> layoutcommit whenever this function is invoked.  This can lead to an 
>> excessive number of layoutcommit calls, since nfs_sync_inode_wait is 
>> called for a variety of reasons to synchronously update the inode.
>>
>> The issue I'm seeing is that the VM calls this function under memory 
>> pressure to flush and commit writes to disk (so it can release the 
>> memory for those write requests).  When this is done, how == 
>> FLUSH_STABLE.  For a 4 GB file, I see this function being invoked 5 - 
>> 15 times, creating 5-15 layoutcommits just to write one file.  This 
>> can cause a big slowdown.  From talking with Trond, our initial 
>> inclination is to NOT issue a layoutcommit in this situation.  What 
>> does everyone think?
>>
>> The larger issue of course is to determine which values of 'how', or 
>> in which situations should a layoutcommit be issued?  This might be a 
>> protocol issue that I should mention on the nfsv4 email list (let me 
>> know if you think so).
>>
>> A cursory examination reveals that nfs_sync_inode_wait is called in 
>> the following situations with the following value of 'how':
>> 1. Lock/Unlock (how==0)
>> 2. Getattr to update the mtime/ctime (how == nocommit)
>> 3. Setattr (how==0)
>> 4. Memory presssure (how == flush_stable)
>> 5. Close (how==0)
>> 6. Rename (how==0)
>> 7. Delegation return (how==0)
>> 8. Fsync (how==0)
>> 9. A few other special cases
>>
>> *Note: how == 0 means to flush all data, sync the data, and update 
>> the metadata.
>>
>> My initial view is that we always want to send a layoutcommit for all 
>> values of 'how' other than FLUSH_STABLE.
>>
>> One interesting case is with getattr, as layoutcommit is required to 
>> update the mtime/ctime, but the data is not committed to stable 
>> storage.  Is it ok to issue a layoutcommit without first calling commit?
>>
>
> I think we want commits to occur before issuing layoutcommit.  Why 
> update the metadata while the data is not guaranteed stable?
>
> I'm not sure what the problem is with getattr.  I think it is fine for 
> getattr to return client cached attrs that are not yet stable.  
> However, if things like mtime, ctime are returned we must be sure that 
> once the layoutcommit does occur that they are updated to at least 
> those times (I think we can do this by passing them into layoutcommit).
I'm off on vacation for the next week so I had enough time to look into 
this thoroughly.  But modifying the file and then reading cached 
attributes is definitely different than with standard NFSv4.  
(nfs_getattr in inode.c)  Not sure how this gels with posix....
Dean
>
> -Garth

-- 
Dean Hildebrand
Ph.D. Candidate
University of Michigan



More information about the pNFS mailing list