[pnfs] Layoutcommit and stable flush
Dean Hildebrand
dhildebz at eecs.umich.edu
Mon Aug 7 14:50:51 EDT 2006
Garth Goodson wrote:
>
>
> Dean Hildebrand wrote:
>> Hi All,
>>
>> I found an issue with layoutcommit when writing larger files. (I
>> notice it with files > 2GB, but I have 2 GB of RAM)
>>
>> Layoutcommit is currently called in nfs_sync_inode_wait. Currently
>> we are ignoring the 'how' parameter, and always issueing a
>> layoutcommit whenever this function is invoked. This can lead to an
>> excessive number of layoutcommit calls, since nfs_sync_inode_wait is
>> called for a variety of reasons to synchronously update the inode.
>>
>> The issue I'm seeing is that the VM calls this function under memory
>> pressure to flush and commit writes to disk (so it can release the
>> memory for those write requests). When this is done, how ==
>> FLUSH_STABLE. For a 4 GB file, I see this function being invoked 5 -
>> 15 times, creating 5-15 layoutcommits just to write one file. This
>> can cause a big slowdown. From talking with Trond, our initial
>> inclination is to NOT issue a layoutcommit in this situation. What
>> does everyone think?
>>
>> The larger issue of course is to determine which values of 'how', or
>> in which situations should a layoutcommit be issued? This might be a
>> protocol issue that I should mention on the nfsv4 email list (let me
>> know if you think so).
>>
>> A cursory examination reveals that nfs_sync_inode_wait is called in
>> the following situations with the following value of 'how':
>> 1. Lock/Unlock (how==0)
>> 2. Getattr to update the mtime/ctime (how == nocommit)
>> 3. Setattr (how==0)
>> 4. Memory presssure (how == flush_stable)
>> 5. Close (how==0)
>> 6. Rename (how==0)
>> 7. Delegation return (how==0)
>> 8. Fsync (how==0)
>> 9. A few other special cases
>>
>> *Note: how == 0 means to flush all data, sync the data, and update
>> the metadata.
>>
>> My initial view is that we always want to send a layoutcommit for all
>> values of 'how' other than FLUSH_STABLE.
>>
>> One interesting case is with getattr, as layoutcommit is required to
>> update the mtime/ctime, but the data is not committed to stable
>> storage. Is it ok to issue a layoutcommit without first calling commit?
>>
>
> I think we want commits to occur before issuing layoutcommit. Why
> update the metadata while the data is not guaranteed stable?
>
> I'm not sure what the problem is with getattr. I think it is fine for
> getattr to return client cached attrs that are not yet stable.
> However, if things like mtime, ctime are returned we must be sure that
> once the layoutcommit does occur that they are updated to at least
> those times (I think we can do this by passing them into layoutcommit).
I'm off on vacation for the next week so I had enough time to look into
this thoroughly. But modifying the file and then reading cached
attributes is definitely different than with standard NFSv4.
(nfs_getattr in inode.c) Not sure how this gels with posix....
Dean
>
> -Garth
--
Dean Hildebrand
Ph.D. Candidate
University of Michigan
More information about the pNFS
mailing list