Server-side silly rename
From Linux NFS
The NFSv3 protocol has no way to say "I'm unlinking this file, but please keep it around because I have an application that's still using it". So if the client wants to provide unix-like semantics, it has to resort to this hack (called "silly rename") on unlink of an open file. See also [1], or the earliest description I'm aware of, in "Design and Implementation of the Sun Network Filesystem" (1985):
We tried very hard to make the NFS client obey UNIX filesystem semantics without modifying the server or the protocol. In some cases this was hard to do. For example, UNIX allows removal of open files. A process can open a file, then remove the directory entry for the file so that it has no name anywhere in the filesystem, and still read and write the file. This is a disgusting bit of UNIX trivia and at first we were just not going to support it, but it turns out that all of the programs that we didn't want to have to fix (csh, sendmail, etc.) use this for temporary files.What we did to make open file removal work on remote files was check in the client VFS remove operation if the file is open, and if so rename it instead of removing it. This makes it (sort of) invisible to the client and still allows reading and writing. The client kernel then removes the new name when the vnode becomes inactive. We call this the 3/4 solution because if the client crashes between the rename and remove a garbage file is left on the server. An entry to cron can be added to clean up on the server.
Silly rename is indeed an imperfect solution. Another case when users sometimes notice the ".nfsXXXX" files is when they try to remove a directory that contains them. Also, it doesn't help if a file is unlinked by a different client than the one that holds it open.
NFSv4 actually does have open and close calls, and our server won't free a file until last close--unless the server reboots, at which point the file will disappear even if an application on the client is still using it. NFS is supposed to keep working normally across server reboots, so the client still does silly rename even in the v4 case.
We could move the responsibility for silly rename to the server--the server could keep a hardlink to the file after unlink, and that would preserve the file after reboot as well. (And it could use a separate directory for the purpose, and avoid the rmdir). We even added a bit to the NFSv4.1 protocol so that the server can tell the client it does this, allowing the client to skip sillyrename (see references to OPEN4_RESULT_PRESERVE_UNLINKED in [2].)
I suspect the client side implementation of this wouldn't be hard--it'd need to watch for the OPEN4_RESULT_PRESERVE_UNLINKED flag and skip silly rename in its presence. (Update: see [3].)
The server side looks harder.
One complication is that knfsd doesn't get exclusive use of exported filesystems: other applications may also be using them. A file opened by an NFS client could be unlinked by a local application, and we'd like the file not to disappear after reboot in that case. That said, the current behavior doesn't handle that case--it doesn't even handle the case when the unlink is done by a different client than the open--so for a first implementation I think it'd be fine to ignore that case.
My rough plan for knfsd is to create a hidden directory in the root of the exported filesystem and modify nfsd4_remove() to check whether the file to be unlinked is open by an NFSv4 client, and if so to instead rename it to that hidden directory. The name shouldn't matter--just use a counter or something.
I think we can use something like the logic at the start of nfsd4_process_open2 to look up a struct nfs4_file from the filehandle, and then use that to check for nfsv4 opens. We also need to prevent the race where a new open comes in after we decide to unlink the file but before we're done unlinking it--I'm not sure how. Also we need to think about the possibility of filehandle aliasing, in which case there may exist two nfs4_files for a given file.
Then we need the close code to check whether we're closing one of these files and, if so, to also unlink it from the hidden directory.
And, finally the laundromat code, after it ends the grace period, needs to walk through the hidden directory and remove any files that haven't been opened. Maybe code like that in nfsd4_recdir_purge_old() would work. This is usually the kind of thing we try not to do from the kernel, but I don't see a clean way to do it from userspace.
That done, if we wanted to also make this work for unlinks by non-NFSv4 clients, we'd need some way to intercept all the unlinks to a given filesystem. We might need to modify the individual exported filesystems.
We may want to think about how exactly to hide that directory. Maybe we could get some kind of help from the filesystem.
The extra hidden link will mean that the st_nlink (for local users) and the numlinks attribute (for NFSv4 GETATTR callers) are wrong. We could fix up the latter, at least, by checking for this specific case.
Approaches I (bfields) considered and rejected for now:
- Create a link in the new directory on every open, and remove it on every close. But open may be a frequent operation, and we'd need to actually sync that link to disk on every operation, so it could be pretty slow. But maybe, with cooperation of the filesystem, we could *just* do the link on open, and delay waiting for the sync until there's an unlink.
- Filesystems already have to deal with the case where the system crashes while there are unlinked open files. I believe they keep a list of such files so they can free them in fsck or next mount. I considered hooking into that process somehow--perhaps the server could be given an interface allowing it to discover those orphaned files. It would require nfsd to be involved in the mount process (currently we mount first, then export). And we'd have to figure out how to perform clean shutdowns without losing those files. And we'd have to worry about losing them any time an administrator fsck'd or mounted without running nfsd. So in the end maybe it wouldn't work.