[pnfs] Layoutcommit and stable flush
Garth Goodson
Garth.Goodson at netapp.com
Mon Aug 7 15:45:22 EDT 2006
Dean Hildebrand wrote:
>
>
> Garth Goodson wrote:
>>
>>
>> Dean Hildebrand wrote:
>>> Hi All,
>>>
>>> I found an issue with layoutcommit when writing larger files. (I
>>> notice it with files > 2GB, but I have 2 GB of RAM)
>>>
>>> Layoutcommit is currently called in nfs_sync_inode_wait. Currently
>>> we are ignoring the 'how' parameter, and always issueing a
>>> layoutcommit whenever this function is invoked. This can lead to an
>>> excessive number of layoutcommit calls, since nfs_sync_inode_wait is
>>> called for a variety of reasons to synchronously update the inode.
>>>
>>> The issue I'm seeing is that the VM calls this function under memory
>>> pressure to flush and commit writes to disk (so it can release the
>>> memory for those write requests). When this is done, how ==
>>> FLUSH_STABLE. For a 4 GB file, I see this function being invoked 5 -
>>> 15 times, creating 5-15 layoutcommits just to write one file. This
>>> can cause a big slowdown. From talking with Trond, our initial
>>> inclination is to NOT issue a layoutcommit in this situation. What
>>> does everyone think?
>>>
>>> The larger issue of course is to determine which values of 'how', or
>>> in which situations should a layoutcommit be issued? This might be a
>>> protocol issue that I should mention on the nfsv4 email list (let me
>>> know if you think so).
>>>
>>> A cursory examination reveals that nfs_sync_inode_wait is called in
>>> the following situations with the following value of 'how':
>>> 1. Lock/Unlock (how==0)
>>> 2. Getattr to update the mtime/ctime (how == nocommit)
>>> 3. Setattr (how==0)
>>> 4. Memory presssure (how == flush_stable)
>>> 5. Close (how==0)
>>> 6. Rename (how==0)
>>> 7. Delegation return (how==0)
>>> 8. Fsync (how==0)
>>> 9. A few other special cases
>>>
>>> *Note: how == 0 means to flush all data, sync the data, and update
>>> the metadata.
>>>
>>> My initial view is that we always want to send a layoutcommit for all
>>> values of 'how' other than FLUSH_STABLE.
>>>
>>> One interesting case is with getattr, as layoutcommit is required to
>>> update the mtime/ctime, but the data is not committed to stable
>>> storage. Is it ok to issue a layoutcommit without first calling commit?
>>>
>>
>> I think we want commits to occur before issuing layoutcommit. Why
>> update the metadata while the data is not guaranteed stable?
>>
>> I'm not sure what the problem is with getattr. I think it is fine for
>> getattr to return client cached attrs that are not yet stable.
>> However, if things like mtime, ctime are returned we must be sure that
>> once the layoutcommit does occur that they are updated to at least
>> those times (I think we can do this by passing them into layoutcommit).
> I'm off on vacation for the next week so I had enough time to look into
> this thoroughly. But modifying the file and then reading cached
> attributes is definitely different than with standard NFSv4.
> (nfs_getattr in inode.c) Not sure how this gels with posix....
>
This goes back to a discussion we had with Trond. Trond's view is as
long as it doesn't break open-close semantics, all is good. Thus, new
attrs only need to become visible at close time. The client doing the
I/O knows he is doing I/O and thus can figure out what lengths, times,
to return. If apps want better synchronization, then they must use locks.
-Garth
Dean
>>
>> -Garth
>
More information about the pNFS
mailing list