Network File System Version 4 T. Myklebust Internet-Draft Network Appliance, Inc. Expires: December 3, 2005 J. Fields W. Adamson P. Honeyman CITI June 2005 Network File System (NFS) version 4 byte range delegations draft-myklebust-nfsv4-byte-range-delegations-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 3, 2005. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This document describes a set of extensions to the NFS version 4 protocol that enable the client to cache file data when caching conflicts prevent the server from handing out a file delegation. The proposed extensions enable the caching of only those specific Myklebust, et al. Expires December 3, 2005 [Page 1] Internet-Draft NFSv4 byte range delegations June 2005 byte ranges of data which the user application is reading or writing. As in the case of full delegations, a callback mechanism enables the server to request that the client flush cached data when a caching conflict occurs. Keywords The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 File caching in NFS versions 2 and 3 . . . . . . . . . . . 3 1.2 File caching in NFS version 4 . . . . . . . . . . . . . . 3 1.3 Motivation for extending the NFSv4 delegation model . . . 3 2. Description of the proposed caching model . . . . . . . . . . 5 2.1 File data . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Read delegations . . . . . . . . . . . . . . . . . . . 5 2.1.2 Write delegations . . . . . . . . . . . . . . . . . . 6 2.2 Upgrading and downgrading byte ranges . . . . . . . . . . 6 2.3 File truncation and extension . . . . . . . . . . . . . . 7 2.4 Byte range locks . . . . . . . . . . . . . . . . . . . . . 7 3. Stateids and byte range delegations . . . . . . . . . . . . . 8 3.1 The current delegation stateid . . . . . . . . . . . . . . 8 4. Callback model . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1 Revocation . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Client recovery from a recalled byte range delegation . . 9 4.3 Client recovery from a recalled file delegation . . . . . 10 4.4 Use of CB_GETATTR for querying the size attribute . . . . 10 5. Crash recovery . . . . . . . . . . . . . . . . . . . . . . . . 11 5.1 Client reboot scenario . . . . . . . . . . . . . . . . . . 11 5.2 Server reboot scenario . . . . . . . . . . . . . . . . . . 11 5.3 Network partition . . . . . . . . . . . . . . . . . . . . 11 6. New client operations . . . . . . . . . . . . . . . . . . . . 12 6.1 DELEG_OPEN - request new byte-range delegation stateid . . 12 6.2 DELEG_RANGE - extend delegation to cover a byte range . . 14 6.3 DELEG_DOWNGRADE - downgrades a write delegation on a byte range . . . . . . . . . . . . . . . . . . . . . . . . 17 6.4 DELEG_RELEASE - release a delegation on a byte range . . . 19 6.5 DELEG_PUT_STATEID - set the current delegation stateid . . 20 7. New callback operations . . . . . . . . . . . . . . . . . . . 22 7.1 CB_RECALL_RANGE - recall a byte range delegation . . . . . 22 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 23 Intellectual Property and Copyright Statements . . . . . . . . 25 Myklebust, et al. Expires December 3, 2005 [Page 2] Internet-Draft NFSv4 byte range delegations June 2005 1. Introduction 1.1 File caching in NFS versions 2 and 3 The NFS protocol versions 2 and 3 do not offer any caching guarantees to clients. The most commonly implemented caching model is the so- called close-to-open model, which relies on user applications providing their own assurances of exclusive access to file data. In this model, the clients limit themselves to checking cache consistency when the user opens and closes the file. In the case where the NLM locking extensions are implemented, checks are also performed upon taking and releasing advisory locks. 1.2 File caching in NFS version 4 With the introduction of delegations, NFS version 4 [RFC3530] strengthens file caching guarantees at the protocol level under limited circumstances that mirror those under which the close-to-open model is valid. When the client opens a file for reading, the server is permitted to offer a "file read delegation" after having determined that no other clients have been granted write access. This is a guarantee that the file data and meta-data will not change until the client gives up the delegation. A file read delegation also gives the client the opportunity to cache byte range read locks and READ open share locks. When the client opens a file with READ or WRITE share semantics, and the server determines that the client is the exclusive user of that file, it may offer a "file write delegation". In doing so it guarantees that no other client may read or modify the file until the delegation is returned. A write delegation also enables the caching of all byte range locks and open share locks. The key difference in functionality between a file delegation and a lock lies in the fact that the server is able to recall the delegation at any time by means of a callback channel. When a delegation is recalled, the client is expected to flush its cache, establish its cached locks on the server, and return the delegation, and to do all this as quickly as possible. If the server notes that the client has failed to return the delegation within a grace time of 1 lease period, then the server may unilaterally revoke the delegation. 1.3 Motivation for extending the NFSv4 delegation model Problems arise when multiple clients wish to access the file, and one Myklebust, et al. Expires December 3, 2005 [Page 3] Internet-Draft NFSv4 byte range delegations June 2005 (or more) has open for writing. Delegations are ruled out for this case, so unless an application uses byte range locking, a client is unable to tell whether cached data is valid. Perforce, clients fall back to not caching data or checking cache validity frequently, increasing the I/O burden on the server. One long-standing problem that the NFSv4 delegation model therefore fails to solve is that of providing cache consistency guarantees as strong as those provided by local file-systems. This failure has a broad impact, e.g. it interferes with porting applications from a single machine environment to a cluster of machines that share files with NFS. Among the applications that require stronger caching semantics than NFSv4 provides are those that use shared memory mapped files for synchronisation and communication between processes on different clients but do no supplementary locking. Another example is shared append-only files such as logs. Even applications that use byte range locking for synchronisation are affected. Unless a peek at the change attribute shows that no-one has written to the file anywhere in the file, a client may be forced to ignore otherwise valid cached data. Myklebust, et al. Expires December 3, 2005 [Page 4] Internet-Draft NFSv4 byte range delegations June 2005 2. Description of the proposed caching model Except for the special case of the size attribute, this document does not address the issue of file meta-data consistency. The proposed model resembles that of file delegations in that the client can register with the server to provide synchronous notification of changes to locks and cached data. It also provides synchronisation guarantees between writers by allowing them to request temporary exclusive access to byte ranges of the file. The model is required to operate consistently in a mixed environment in which some clients may be using older versions of the NFS protocol together with uncached I/O. To the older clients, those that are using byte range delegations should appear to behave as if they too are using uncached I/O. 2.1 File data 2.1.1 Read delegations A server that grants a read delegation on a byte range guarantees that no other client may change the data or acquire a write-lock in the covered region until the delegation is released. Note that a SETATTR that modifies the size of a file effectively changes the data in the region between the old and new sizes. The client may request a read delegation on a byte range using the DELEG_RANGE operation with a lock type argument of READ_LT or READW_LT. In the case where the READ_LT argument is used, the DELEG_RANGE call should fail without triggering a recall if another client holds a write delegation for that range. Clients can use this mechanism in order to issue speculative requests that might fail, e.g. read-ahead requests. The server MUST, however initiate the recall of any conflicting write delegation when the READW_LT variant is used whether or not the request is granted. In the proposed model, if a current delegation stateid has been set using a previous DELEG_PUT_STATEID or DELEG_RANGE operation, then a READ request implicitly requests a read delegation on the byte range covered by its arguments. In this case, the server should treat the READ request as if it has been immediately preceded by a DELEG_RANGE call with a READW_LT argument. A server MUST refuse to grant a read delegation on a range that would overlap with a write delegation held by another client. In order to allow the caching of byte range locks, the server MUST also refuse to grant a read delegation for a range that overlaps with a WRITE lock Myklebust, et al. Expires December 3, 2005 [Page 5] Internet-Draft NFSv4 byte range delegations June 2005 held by another client. If another client attempts to write into the region covered by the delegation, the server should initiate an immediate recall. It may then optionally return an error of NFS4ERR_DELAY to the write request. 2.1.2 Write delegations A server that grants a write delegation on a byte range guarantees that no other client may change the data in that region until the delegation has been released. In addition, it guarantees that no other client may read data or hold a read delegation in that region until the write delegation has been downgraded or released. The client may request a write delegation on a byte range using the DELEG_RANGE operation with a lock type argument of WRITE_LT or WRITEW_LT. In the case where the WRITE_LT argument is used, the DELEG_RANGE call should fail without triggering a recall if another client holds a read or write delegation for that range. The server MUST, however initiate the recall of any conflicting read or write delegation when the WRITEW_LT variant is used. A server MUST refuse to grant a write delegation that would overlap with a read or write delegation held by another client. In order to allow the caching of byte range locks, the server MUST also refuse to grant a write delegation for a range that overlaps with a READ or WRITE lock held by another client. To avoid lock starvation for write delegations, the server is encouraged to implement the same queueing scheme that is described for byte range locks in Section 8.4 of [RFC3530]. 2.2 Upgrading and downgrading byte ranges In the proposed mode, a client may request to upgrade a read delegation to a write delegation at any time using the DELEG_RANGE operation. If successful, the upgrade must be performed atomically by the server so that the client that requested the upgrade can keep any cached data. Similarly, a client that is holding a write delegation on a byte range may, once it is done flushing out any dirty data, request that the server atomically downgrade it to a read delegation using the DELEG_DOWNGRADE operation. It is expected that clients will take advantage of this as part of a COMMIT compound to obviate recalls. Myklebust, et al. Expires December 3, 2005 [Page 6] Internet-Draft NFSv4 byte range delegations June 2005 2.3 File truncation and extension Changes to the file size MUST trigger a recall of all byte range delegations held by other clients in the region between the old and new end of file. A useful consequence of this rule is that a client wishing to be notified of changes to the size attribute may achieve this by requesting a read or write delegation that covers the 2 byte range starting at the offset (size - 1). If a client holds a write delegation in the region of the end of file marker, then it is guaranteed that no other clients can append to the file until the client holding the write delegation has finished writing out its modifications and released the delegation in that region. 2.4 Byte range locks A client holding a write delegation may cache read or write byte range lock requests, provided they are fully included in the range covered by the write delegation. A client holding a read delegation may cache read byte range lock requests provided they are fully included in the region covered by the read delegation. If a delegation is recalled or downgraded, the client is responsible for establishing any cached locks to the server as part of the process of recovery. Myklebust, et al. Expires December 3, 2005 [Page 7] Internet-Draft NFSv4 byte range delegations June 2005 3. Stateids and byte range delegations One of the goals of the delegation model is to allow clients to cache data without having to tie that delegation to a particular open stateid. Although the DELEG_OPEN operation uses an open stateid and sequence to guarantee only-once semantics, the resulting stateid is not considered to be associated to this particular open stateid. To allow it to be reused with other open stateids, therefore, the byte range delegation stateid does not carry any share or lock information. A client holding a write delegation on a particular byte range has no guarantee that the share reservations on that file allow write access. 3.1 The current delegation stateid To allow the server to check that a given operation does not violate the requested caching semantics, we add the notion of a "current delegation stateid". Rather than replacing the usual open stateid argument, the current delegation stateid is set in a separate operation that precedes the READ, WRITE, or SETATTR operation that it protects. It is set either implicitly using a DELEG_RANGE operation, or by using the dedicated operation DELEG_PUT_STATEID. The current delegations stateid is automatically cleared by any operation that changes the current filehandle. It may also be cleared by explicitly calling DELEG_PUT_STATEID with a special stateid argument consisting of all zeros. If set, the current delegation stateid applies to all subsequent READ, WRITE and SETATTR operations within the same COMPOUND. The server is required to check the current delegation stateid in addition to the READ/WRITE/SETATTR's stateid argument, and should return NFS4ERR_OLD_STATEID if either stateid has been superseded due to a state change. This may, for instance occur in the case of a race with another DELEG_DOWNGRADE or DELEG_RELEASE request on the same file. Myklebust, et al. Expires December 3, 2005 [Page 8] Internet-Draft NFSv4 byte range delegations June 2005 4. Callback model 4.1 Revocation Servers are permitted to recall a byte range delegation at any time and for any reason. Typical scenarios that trigger such a recall include: o Resolving a caching conflict due to a request from another client. Operations that may require a recall of the byte range delegation include READ, WRITE, LOCK, LOCKT, SETATTR, OPEN or DELEG_RANGE. o Another client's read patterns triggers speculative read-ahead on the server. o The amount of delegation state being managed by the server grows too large, triggering a reclaim of resources. There are two ways for a server to recall a byte range delegation: o As for file delegations, the server can use CB_RECALL to request that a client flush all writes and locks affected by the delegation, and return the delegation using the DELEGRETURN operation. If the client later wishes to re-establish a delegation, then it must first call DELEG_OPEN to obtain a new delegation stateid. o The new CB_RECALL_RANGE allows the server finer granularity over which region of the file that it wishes to control. CB_RECALL_RANGE also allows the server to request a downgrade rather than a full recall of a region that holds cached writes. By requesting a downgrade, the server signals that the client may convert its write delegations into read delegations after it has finished flushing the cached writes to disk. Clients that request byte range delegations MUST be able to handle both CB_RECALL and CB_RECALL_RANGE recall requests. 4.2 Client recovery from a recalled byte range delegation When the server recalls a byte range or part of a byte range that has been delegated, the client recovery process is very similar to that of file delegation: o If the client holds a read delegation on the recalled byte range, then it should recover any cached byte range read locks and mark the read cache as invalid. o If a write delegation is held on all or part of the byte range being recalled, then the client should recover any cached read or write locks, flush out all pending writes, and mark the read cache as invalid. The recovery process ends when the client returns the delegation on the recalled range using either the DELEG_RELEASE or DELEGRETURN Myklebust, et al. Expires December 3, 2005 [Page 9] Internet-Draft NFSv4 byte range delegations June 2005 operations. If the server requests a downgrade of a write delegation, then the client may optionally select to use a DELEG_DOWNGRADE instead of returning the entire delegation. If it chooses to do so then it need not mark the read cache as invalid on that range. 4.3 Client recovery from a recalled file delegation If the server recalls a file write delegation, then the client may request read or write byte range delegations as part of the usual process of recovering cached locks and flushing out writes. The server is under no obligation to honour these requests, but it may choose to do so in order to allow the client to continue to cache read data or writes that are not causing any immediate cache consistency conflicts. Likewise, in the case where the server recalls a file read delegation, then the client may issue requests for byte range read delegations during the recovery phase. 4.4 Use of CB_GETATTR for querying the size attribute If a client holds a write delegation that extends across the end of file, then it may cache SETATTR or WRITE operations that will cause the size attribute to change. Rather than recall the delegation when a second client attempts to query the size attribute, the server MAY choose to send a CB_GETATTR callback to the client holding the delegation in order to determine the true file size. Note that the server MUST NOT issue a CB_GETATTR query for any attributes other than size. Myklebust, et al. Expires December 3, 2005 [Page 10] Internet-Draft NFSv4 byte range delegations June 2005 5. Crash recovery As usual under NFS, the recovery of byte range delegations after a crash is driven by clients. 5.1 Client reboot scenario If the client reboots using the standard calls to SETCLIENTID and SETCLIENTID_CONFIRM then the server is expected to clear the byte range delegations as part of the usual operation of breaking the lease state owned by the previous incarnation of the client. 5.2 Server reboot scenario The client discovers a server reboot in the usual fashion by receiving a NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID. If the server supports a grace period, the client may then attempt to recover byte range delegations as part of the normal process of state recovery. During the grace period, the client recovers the byte range delegation by issuing requests with the reclaim flag set to true. The server guarantees that the file will not change in the usual fashion by rejecting any conflicting non-reclaim delegation, locking and OPEN requests, READ, WRITE, and SETATTR. 5.3 Network partition If a network partition causes the client to fail to renew its leases within the usual lease expiration period, the server MAY choose to hold the byte range delegation on behalf of the client until a conflict forces a revocation. In the latter case, the server should return NFS4ERR_EXPIRED in response to any attempts to use the delegation. If the client sees that the change attribute on the file has not been modified, it may attempt to re-establish its byte range delegations by requesting a DELEG_OPEN, and then replaying the DELEG_RANGE requests to the server. The client should ensure that it revalidates its cache using the change attribute also after recovery is complete in order to make sure that the cache is still valid. The reader is referred to the section "Revocation Recovery for Write Open Delegation" in [RFC3530] for a discussion on how to deal with cached writes in regions where recovery of the byte range delegation has failed. Myklebust, et al. Expires December 3, 2005 [Page 11] Internet-Draft NFSv4 byte range delegations June 2005 6. New client operations 6.1 DELEG_OPEN - request new byte-range delegation stateid SYNOPSIS (cfh), open_seqid, open_stateid, deleg_seqid -> stateid, delegation ARGUMENT struct DELEG_OPEN4args { /* CURRENT_FH: opened file */ seqid4 open_seqid; stateid4 open_stateid; seqid4 deleg_seqid; }; RESULT struct DELEG_OPEN4resok { stateid4 stateid; /* byte range delegation */ open_delegation4 delegation; /* open delegation */ }; union DELEG_OPEN4res switch (nfsstat4 status) { case NFS4_OK: /* CURRENT_STATEID: Stateid for byte range delegation */ DELEG_OPEN4resok resok4; default: void; }; DESCRIPTION DELEG_OPEN requests a byte-range delegation stateid for a given file. The open stateid and sequence id are used to ensure only-once semantics in the absence of sessions [draft-ietf-nfsv4-sess-01]. The delegation sequence identifier should be initialised to zero upon the first call to DELEG_OPEN for a given file and each time the user gives up the byte range delegation stateid. If the client attempts to call DELEG_OPEN using the special stateids consisting of all zero bits or all one bits, the server should deny the request using the error NFS4ERR_OPENMODE. The server is also required to deny this request with a NFS4ERR_CB_PATH_DOWN if the callback path cannot be established. Myklebust, et al. Expires December 3, 2005 [Page 12] Internet-Draft NFSv4 byte range delegations June 2005 On success, the current filehandle retains its value. The current delegation stateid is replaced with the stateid corresponding to the byte range delegation. IMPLEMENTATION The client gives up the byte range delegation stateid using the DELEGRETURN operation. At any given time there should be at most one byte-range delegation stateid in existence per (file, client) pair. A client is permitted to send multiple DELEG_OPEN requests, however the server should then reply with the same stateid. The server may additionally choose to grant the client an ordinary file delegation. ERRORS NFS4ERR_ACCESS NFS4ERR_ADMIN_REVOKED NFS4ERR_BADHANDLE NFS4ERR_BAD_SEQID NFS4ERR_BAD_STATEID NFS4ERR_BADXDR NFS4ERR_CB_PATH_DOWN NFS4ERR_DELAY NFS4ERR_DENIED NFS4ERR_EXPIRED NFS4ERR_FHEXPIRED NFS4ERR_ISDIR NFS4ERR_LEASE_MOVED NFS4ERR_MOVED NFS4ERR_NOFILEHANDLE NFS4ERR_NOTSUPP NFS4ERR_OLD_STATEID NFS4ERR_OPENMODE NFS4ERR_RESOURCE NFS4ERR_SERVERFAULT NFS4ERR_STALE NFS4ERR_STALE_CLIENTID NFS4ERR_STALE_STATEID Myklebust, et al. Expires December 3, 2005 [Page 13] Internet-Draft NFSv4 byte range delegations June 2005 6.2 DELEG_RANGE - extend delegation to cover a byte range SYNOPSIS (cfh), locktype, reclaim, stateid, offset, length -> (cstateid), offset, length, recall ARGUMENT struct DELEG_RANGE4args { /* CURRENT_FH: file */ nfs_lock_type4 locktype; bool reclaim; stateid4 stateid; offset4 offset; length4 length; }; RESULT enum delegreturn4 { NORECALL = 0, DOWNGRADE = 1, RECALL = 2 }; struct DELEG_RANGE4resok { offset4 offset; length4 length; delegreturn4 recall; }; union DELEG_RANGE4res (nfsstat4 status) { case NFS4_OK: DELEG_RANGE4resok resok4; default: void; }; DESCRIPTION The DELEG_RANGE operation requests a delegation for the byte range specified by the offset and length parameters. The locktype specifies the type of caching semantics that are requested. A reclaim request is signalled by setting the reclaim parameter to TRUE. If the locktype is set to READ_LT or WRITE_LT, and another client Myklebust, et al. Expires December 3, 2005 [Page 14] Internet-Draft NFSv4 byte range delegations June 2005 holds a conflicting delegation, the server should return NFS4ERR_DENIED. If, however locktype is either READW_LT or WRITEW_LT, the server should initiate a recall of all conflicting delegations prior to returning NFS4ERR_DENIED. If a client requests a locktype of WRITE_LT or WRITEW_LT on a region for which it already holds a read delegation, then the server should attempt to atomically upgrade the existing delegation. A server that does not support atomic upgrades or downgrades of the byte range delegation should return NFS4ERR_LOCK_NOTSUPP. On success, the server returns the range covered by the delegation. Note that the server may choose to extend the range requested by the client in order to decrease the administrative burden by merging noncontiguous delegation ranges. It MUST not, however, return a range that is smaller than that requested by the client. The "recall" flag is an optimisation that can be used by the server to notify the client that a conflicting request is already queued. If this flag is set to DOWNGRADE then the client should should downgrade the write delegation to a read delegation. If it is set to RECALL, then the client should release the delegation. On success the current filehandle retains its value, and the current delegation stateid is set to the new value. IMPLEMENTATION DELEG_RANGE may be called on a given stateid as many times as desired. The server may represent the result bytes covered internally as a list of noncontiguous byte ranges. Or, if it chooses, it may choose a simpler representation--for example, a single range covering all of the bytes ever requested. A server is is free to reject DELEG_RANGE requests and to recall them for any reason, so at worst, this might cause the server to deny requests (or recall delegations) more often than is strictly necessary. The READW_LT and WRITEW_LT lock types cause the server to recall any conflicting delegations from other clients. A client will want to use these variants in situations where strong cache consistency guarantees are needed. A length field with all bits one extends the delegation through the end of file, regardless of how long the file actually is. If mandatory file locking is on for the file, and if a lockowner on a client other than the one from which this DELEG_RANGE request originated holds a conflicting lock, then the server should return Myklebust, et al. Expires December 3, 2005 [Page 15] Internet-Draft NFSv4 byte range delegations June 2005 NFS4ERR_LOCKED. ERRORS NFS4ERR_ACCESS NFS4ERR_ADMIN_REVOKED NFS4ERR_BADHANDLE NFS4ERR_BAD_RANGE NFS4ERR_BAD_STATEID NFS4ERR_BADXDR NFS4ERR_DELAY NFS4ERR_DENIED NFS4ERR_EXPIRED NFS4ERR_FHEXPIRED NFS4ERR_GRACE NFS4ERR_INVAL NFS4ERR_ISDIR NFS4ERR_LEASE_MOVED NFS4ERR_LOCKED NFS4ERR_LOCK_NOTSUPP NFS4ERR_MOVED NFS4ERR_NOFILEHANDLE NFS4ERR_NO_GRACE NFS4ERR_NOTSUPP NFS4ERR_OLD_STATEID NFS4ERR_RECLAIM_BAD NFS4ERR_RECLAIM_CONFLICT NFS4ERR_RESOURCE NFS4ERR_SERVERFAULT NFS4ERR_STALE NFS4ERR_STALE_STATEID Myklebust, et al. Expires December 3, 2005 [Page 16] Internet-Draft NFSv4 byte range delegations June 2005 6.3 DELEG_DOWNGRADE - downgrades a write delegation on a byte range SYNOPSIS (cfh), stateid, deleg_seqid, offset, length -> stateid, recall ARGUMENT struct DELEG_DOWNGRADE4args { /* CURRENT_FH: file */ stateid4 stateid; seqid4 deleg_seqid; offset4 offset; length4 length; }; RESULT struct DELEG_DOWNGRADE4resok { stateid4 stateid; bool recall; }; union DELEG_DOWNGRADE4res switch (nfsstat4 status) { case NFS4_OK: DELEG_DOWNGRADE4resok resok; default: void; }; DESCRIPTION DELEG_DOWNGRADE is used by the client to downgrade all write delegations held over a given byte range and convert them into read delegations. The server may piggyback a request to have the client release the delegation onto the reply by setting the "recall" flag to true. On success the current filehandle retains its value, and the current delegation stateid is set to the new value. If the client holds no write delegations in the range (offset,length), then the server should treat this operation as a no-op and simply return NFS4_OK. If the server is unable to atomically convert the existing write delegations into read delegations, then the request should fail with Myklebust, et al. Expires December 3, 2005 [Page 17] Internet-Draft NFSv4 byte range delegations June 2005 the error NFS4ERR_LOCK_NOTSUPP. ERRORS NFS4ERR_ADMIN_REVOKED NFS4ERR_BADHANDLE NFS4ERR_BAD_RANGE NFS4ERR_BAD_STATEID NFS4ERR_BADXDR NFS4ERR_DELAY NFS4ERR_EXPIRED NFS4ERR_FHEXPIRED NFS4ERR_GRACE NFS4ERR_INVAL NFS4ERR_ISDIR NFS4ERR_LEASE_MOVED NFS4ERR_LOCK_NOTSUPP NFS4ERR_MOVED NFS4ERR_NOFILEHANDLE NFS4ERR_NOTSUPP NFS4ERR_OLD_STATEID NFS4ERR_RESOURCE NFS4ERR_SERVERFAULT NFS4ERR_STALE NFS4ERR_STALE_STATEID Myklebust, et al. Expires December 3, 2005 [Page 18] Internet-Draft NFSv4 byte range delegations June 2005 6.4 DELEG_RELEASE - release a delegation on a byte range SYNOPSIS (cfh), stateid, deleg_seqid, offset, length -> stateid ARGUMENT struct DELEG_RELEASE4args { /* CURRENT_FH: file */ stateid4 stateid; seqid4 deleg_seqid; offset4 offset; length4 length; }; RESULT struct DELEG_RELEASE4resok { stateid4 stateid; }; union DELEG_RELEASE4res switch (nfsstat4 status) { case NFS4_OK: DELEG_RELEASE4resok resok; default: void; }; DESCRIPTION The DELEG_RELEASE operation notifies the server that the client is no longer caching any data in the specified range, and returns any byte range delegations that may be held in that range. ERRORS NFS4ERR_ADMIN_REVOKED NFS4ERR_BADHANDLE NFS4ERR_BAD_RANGE NFS4ERR_BAD_STATEID NFS4ERR_BADXDR NFS4ERR_DELAY NFS4ERR_EXPIRED NFS4ERR_FHEXPIRED NFS4ERR_INVAL NFS4ERR_ISDIR Myklebust, et al. Expires December 3, 2005 [Page 19] Internet-Draft NFSv4 byte range delegations June 2005 NFS4ERR_LEASE_MOVED NFS4ERR_MOVED NFS4ERR_NOFILEHANDLE NFS4ERR_NOTSUPP NFS4ERR_OLD_STATEID NFS4ERR_RESOURCE NFS4ERR_SERVERFAULT NFS4ERR_STALE NFS4ERR_STALE_STATEID 6.5 DELEG_PUT_STATEID - set the current delegation stateid SYNOPSIS (cfh), stateid -> (cstateid) ARGUMENT struct DELEG_PUT_STATEID4args { /* CURRENT_FH: file */ stateid4 stateid; }; RESULT struct DELEG_PUT_STATEID4res { nfsstat4 status; }; DESCRIPTION The DELEG_PUT_STATEID operation is used by the client to set the current delegation stateid. If the client specifies the special stateid consisting of all zeros, then the server is expected to clear the current delegation stateid. IMPLEMENTATION This operation is used in order to apply a byte range delegation to any subsequent READ or WRITE requests within the same COMPOUND. ERRORS NFS4ERR_ADMIN_REVOKED NFS4ERR_BADHANDLE NFS4ERR_BAD_STATEID Myklebust, et al. Expires December 3, 2005 [Page 20] Internet-Draft NFSv4 byte range delegations June 2005 NFS4ERR_BADXDR NFS4ERR_DELAY NFS4ERR_EXPIRED NFS4ERR_FHEXPIRED NFS4ERR_ISDIR NFS4ERR_LEASE_MOVED NFS4ERR_MOVED NFS4ERR_NOFILEHANDLE NFS4ERR_OLD_STATEID NFS4ERR_RESOURCE NFS4ERR_SERVERFAULT NFS4ERR_STALE_STATEID Myklebust, et al. Expires December 3, 2005 [Page 21] Internet-Draft NFSv4 byte range delegations June 2005 7. New callback operations 7.1 CB_RECALL_RANGE - recall a byte range delegation SYNOPSIS stateid, offset, length, downgrade, truncate, fh -> () ARGUMENT struct CB_RECALL_RANGE4args { stateid4 stateid; offset4 offset; length4 length; bool downgrade; bool truncate; nfs_fh4 fh; }; RESULT struct CB_RECALL_RANGE4res { nfsstat4 status; }; DESCRIPTION The CB_RECALL_RANGE operation is used to compel a client to relinquish a delegated byte range and return it to the server. IMPLEMENTATION The downgrade flag is used by the server to inform the client about the nature of the caching conflict that triggered the callback. If set, it indicates that it would suffice to resolve the conflict if the client were to downgrade all write delegations in the range to read delegations. If the downgrade flag is not set, the client MUST prepare to release all delegations in the specified range. The truncate flag is used to inform the client that the byte range being recalled is about to be truncated as a result of an incoming SETATTR or OPEN. The client may use this information to discard any queued writes that may otherwise have had to be transferred to disk. If a race causes the client to believe that it is not holding any delegations in the range specified by the server and there are no Myklebust, et al. Expires December 3, 2005 [Page 22] Internet-Draft NFSv4 byte range delegations June 2005 outstanding requests for this range, then it may signal this to the server using the error NFS4ERR_BAD_RANGE. This may for instance be the case if the server's CB_RECALL_RANGE call raced with a DELEG_RELEASE from the client. ERRORS NFS4ERR_BADHANDLE NFS4ERR_BAD_STATEID NFS4ERR_BAD_XDR NFS4ERR_BAD_RANGE NFS4ERR_BAD_RESOURCE NFS4ERR_BAD_SERVERFAULT 8. References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119. [RFC3530] Shepler, S., "Network File System (NFS) version 4 Protocol", RFC 3530. [draft-ietf-nfsv4-sess-01] Talpey, T. and J. Bauman, "NFSv4 Session Extensions". Authors' Addresses Trond Myklebust Network Appliance, Inc. 535 W. William St., Suite 3100 Ann Arbor, MI 48103 US Phone: +1 734-764-5207 Email: Trond.Myklebust@netapp.com J. Bruce Fields U. of Michigan Center for Information Technology Integration 535 W. William St., Suite 3100 Ann Arbor, MI 48103 US Email: bfields@citi.umich.edu Myklebust, et al. Expires December 3, 2005 [Page 23] Internet-Draft NFSv4 byte range delegations June 2005 William A. Adamson U. of Michigan Center for Information Technology Integration 535 W. William St., Suite 3100 Ann Arbor, MI 48103 US Email: andros@citi.umich.edu Peter Honeyman U. of Michigan Center for Information Technology Integration 535 W. William St., Suite 3100 Ann Arbor, MI 48103 US Email: honey@citi.umich.edu Myklebust, et al. Expires December 3, 2005 [Page 24] Internet-Draft NFSv4 byte range delegations June 2005 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Myklebust, et al. Expires December 3, 2005 [Page 25]