A new NFSv4 server...

Jeff Garzik jeff at garzik.org
Fri Jan 4 11:41:55 EST 2008


Peter Åstrand wrote:
> Many years ago, before NFSv4 was finished, I felt the same. I was waiting 
> for v4 and thought that everything would be so much better. I wanted to 
> help and started the "pynfs" project. Today, I have a different opinion. I 

<grin>


> think v3 is a fairly good protocol, if you use it correctly. For example, 
> many people don't realize that you don't need the portmapper, that you can 
> use a single well-known TCP port, that you can use RPCSEC_GSS and so 
> forth, even with v3. 

Absolutely...  But still, I think integrated mount protocol (aka pseudo 
filesystem namespace) and integrated locking were big steps forward. 
You really shouldn't need more than one protocol.



Speaking of RPCSEC_GSS: I would love to see a much more straightforward 
authentication process, something /not/ buried inside special behaviors 
triggered by opcodes found in an opaque cred struct :/  RPCSEC_GSS 
context creation, the special casing around the 'null' procedure, and 
the overloading of the RPC data portion of things is a huge pain to 
implement.


Authentication and security should be simple, tough to screw up.  I 
would tend to prefer an ASCII-based authentication/security negotiation 
at the start of a [SCTP|TCP] stream.

Use TLS to give most people what they want:  AUTH_SYS with encryption. 
GSSAPI is fine as a "required option" but you shouldn't need GSSAPI to 
do simple wire encryption between IP-authenticated hosts.


> I think v4 has a few valuable improvements, but it comes with a very high 
> price. v3 has a minimalistic beauty which v4 lacks. For example, take a 
> look at the OPEN operation with 7 arguments, of which many are complex 
> data structures:
> 
>     (cfh), seqid, share_access, share_deny, owner, openhow, claim ->
>     (cfh), stateid, cinfo, rflags, open_confirm, attrset delegation
> 
> Not pretty...  

heh, tell me about it.  First I started out using rpcgen, then rewrote 
everything to do raw XDR decoding.  OPEN is huge.

IMO, OPEN should be split into multiple operations, probably one for 
each "OPEN arm".  It's not like new opcode numbers are expensive.

Or, hope of hopes, simplify OPEN in some other manner, like delegating 
tasks to other operations.


>> Oh, certainly.  I was mainly thinking a replacement of the wire protocol would
>> be an easier step for people to swallow than a new protocol.
> 
> I've been thinking of trying to put together something like NFS v3.5. Some 
> parts of v4 are nice, but the complexity is too high. 

Agreed that's it's quite complex.

One of my personal desires is for a high level of cache coherence 
throughout the system for all clients (though perhaps an admin could 
optionally relax this requirement).  I'm a fan of Google's "Chubby", a 
distributed reliable filesystem that stalls client writes until cache 
invalidations for the associated byte range are processed for all 
interested clients.

And anything approaching cache coherence requires some complexity :/

Another thing I like about NFSv4 is that batching sequences into chunks 
of fine-grained operations is generally a useful practice.  So while the 
end result (COMPOUND) is a bit of a pain, bundling a sequence of 
operations into a single unit is useful.

	Jeff





More information about the NFSv4 mailing list