NFSv4 - rpc.gssd eats 100% CPU

Lukas Hejtmanek xhejtman at ics.muni.cz
Wed Oct 21 05:09:31 EDT 2009


Kevin,

On Mon, Oct 22, 2007 at 09:38:57AM -0400, Kevin Coffman wrote:
> > The only thing I can recall when I was having problems is that
> > rpc.gssd was using 100% CPU
> 
> This sounds like some application continually retrying an operation
> even though it keeps getting permission denied (because credentials
> have expired).

this is not quire rare case, we are seeing this on our cluster frontend quite
often - users keep running their apps after their credentials have expired.

The frontend of the cluster has huge number of files in /tmp and rpc.gssd
spends fair amount of time to look through these files to lookup credentials
again and again.

As a solution, I have patch that adds a negative caching into rpc.gssd. It
works like this: if rpc.gssd should look for krb5 cache, first, it looks into
internal hash table to see whether it already has looked for this cache at
this place, if so and until timeout (5 sec by default), it returns: we have no
krb5 cache.

This is only a partial solution, indeed. The second part should live in kernel
code. This part should detect spinning between kernel and rpc.gssd and slow it
down - probably we could put some sleeps here to save CPU cycles. However,
this part is to be done.

Kevin, do you believe that this is useful approach and should be completed?
I have got a man that could do this but only if it is not waste of time.

-- 
Lukáš Hejtmánek


More information about the NFSv4 mailing list