[pnfs] [PATCH 19/37] pnfs: client layout cache: allow waiting for alloc_layout

Benny Halevy bhalevy at panasas.com
Thu Jan 3 15:04:19 EST 2008


On Jan. 03, 2008, 21:29 +0200, Dean Hildebrand <seattleplus at gmail.com> wrote:
> I think we need to talk about all these waits and locks on the conf 
> call.   This code has almost no comments, so understanding their purpose 
> is really difficult.
> 
> There seems to be multiple nested locks and waits, with the locks and unlocks happening in different functions.  This could easily lead to deadlocks, maybe not by you, but future maintainers will have a heck of a time ensuring they don't accidentally change things so a deadlock occurs. 
> 
> 
> My personal view is that each function should grab and release its locks 
> at the same level in the call hierarchy.  For example,  
> pnfs_update_layout grabs locks in lower level function calls 
> (get_alloc_layout), and then goes on to release and reacquire these 
> locks at the high level.  This is bad.   

The lower level functions either search or allocate the layout structure and
return it locked and referenced. it's possible to only increment the reference
count and take the lock right before it's needed in the calling function.
That might look simpler but will be less efficient since reference
counting will take one atomic (== bus locking) operation and the lock
will take another one.

The reason the lock is released around the call to get_layout is that
get_layout sends an rpc and you certainly don't want to keep the spinlock
around it.

> Not only that, but it locks 
> nfsi->lo_lock in the get_lock_current_layout function but also in random 
> other parts of the code (end of get_alloc_layout).  At the minimum, a 

not really random. The lock is taken by exactly two low-level helper functions:
get_lock_current_layout and get_alloc_layout, when they return non-NULL.
It is released in exactly one low-level helper function: put_unlock_current_layout.

The only special case the lock is release and taken is in pnfs_update_layout,
around the call to get_layout.

> single helper function should lock and unlock lo_lock.


> 
> goto's have their purpose, but C has lots of constructs that allow code 
> simplification at the expense of readability.  Programming style in 1974 
> (the year of that paper) applies little to programming in 2007.  For 
> example, recursion is useful and cool, but most programmers these days 
> will not understand what it is doing.  And even for those who do 
> understand recursion, it will take longer to review the code.  Let's 
> just stick with regular loops and not sacrifice readability.

OK. readability is good :)

> 
> more below.
> 
>> Well, the allocation function might block so a spinlock is inappropriate,
>> a mutex could have been used instead.  
> Do you mean the layout driver will block? Why?

kmalloc can block when memory is low when the system evicts dirty
pages or swaps out clean pages.

>> However, on a different thread,
>> Trond expressed his desire to get rid of locks where possible and use
>> waits instead.
> waits are ok, but pnfs_inject_layout, get_lock_current_layout, 
> put_lock_current_layout, get_alloc_layout, pnfs_update_layout, 
> pnfs_getboundary,  seem to mix waits with locks in scattered places.  I 
> think these are the functions that need to be reorg'd.

Hmm, the locks and waits you mentioned have different purposes.
spinlocks are used for efficient, short term locking where the critical
sections taking the locks are guaranteed to never block.

Waits are used for longer term synchronization where the code holding
the lock is expected to block or just take longer time to execute.

> 
>>  The idea here is to solve the potential race between
>> multiple threads wanting to allocate nfsi->current_layout while
>> allowing signals to interrupt and wake up any waiting thread.
>>   
> I like the idea, we just need to get the code to do this in an 
> appropriate way.  Would it be possible to re-org the code so that
> get_lock_current_layout, put_lock_current_layout are always called in 
> the same function?  In addition, get_lock_current_layout, 
> put_lock_current_layout would be the only 2 functions locking the 
> nfsi->lo_lock? 

The only way I see to simplify that is what I described above, i.e.
take the spinlock locally in the higher level functions, sacrificing
performance.  We can that if more folks are confused about the current
implementation, though I think that the model is simple enough to follow
once you grasp the general idea.

Benny

> Dean
> 
> 
>>   
>>> 	res = wait_on_bit_lock(&nfsi->pnfs_layout_state, NFS_INO_LAYOUT_ALLOC,
>>> 			       pnfs_wait_schedule, TASK_INTERRUPTIBLE);
>>>
>>> 	if (res) {
>>> 		lo = ERR_PTR(res);
>>> 		goto err;
>>> 	}
>>>
>>> 	if (nfsi->current_layout) {
>>> 		clear_bit_unlock(NFS_INO_LAYOUT_ALLOC,
>>> 				 &nfsi->pnfs_layout_state);
>>> 		wake_up_bit(&nfsi->pnfs_layout_state, NFS_INO_LAYOUT_ALLOC);
>>>
>>> 		goto out;
>>> 	}
>>>
>>>  	lo = nfsi->current_layout = alloc_init_layout(ino, io_ops);
>>> 	clear_bit_unlock(NFS_INO_LAYOUT_ALLOC, &nfsi->pnfs_layout_state);
>>> 	wake_up_bit(&nfsi->pnfs_layout_state, NFS_INO_LAYOUT_ALLOC);
>>>     
>> This patch represents an intermediate version of this function before the
>> introduction of {get_lock,put_unlock}_current_layout
>> (see http://git.linux-nfs.org/?p=bhalevy/linux-pnfs.git;a=commitdiff;h=5220589afa99bdc8f5301e26db8d5c5870c285c8)
>> I hope the backward goto makes more sense over there when an extra step (atomic locking 
>> and referencing of nfsi->current_layout) is needed.
>> This can be changed to an infinite loop with explicit continue and break points
>> like the following code
>> (if it looks simpler, I'm not sure... see http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf)
>>
>> static struct pnfs_layout_type *
>> get_alloc_layout(struct inode *ino,
>> 		   struct layoutdriver_io_operations *io_ops)
>> {
>> 	struct nfs_inode *nfsi = NFS_I(ino);
>> 	struct pnfs_layout_type *lo;
>> 	int res;
>>
>> 	dprintk("%s Begin\n", __FUNCTION__);
>>
>>   
>> 	while ((lo = get_lock_current_layout(nfsi)) == NULL) {
>> 		/* Compete against other threads on who's doing the allocation,
>> 		 * wait until bit is cleared if we lost this race.
>> 		 */
>> 		res = wait_on_bit_lock(&nfsi->pnfs_layout_state, NFS_INO_LAYOUT_ALLOC,
>> 				       pnfs_wait_schedule, TASK_INTERRUPTIBLE);
>> 		if (res) {
>> 			lo = ERR_PTR(res);
>> 			goto err;
>> 		}
>>
>> 		/* Was current_layout already allocated while we slept?
>> 		 * If so, try to get it again using get_lock_current_layout.
>> 		 */
>> 		if (nfsi->current_layout) {
>> 			clear_bit_unlock(NFS_INO_LAYOUT_ALLOC,
>> 					 &nfsi->pnfs_layout_state);
>> 			continue;
>>   
>                                                      ^^^^^^^^^^
> 
>> 		}
>>
>> 		/* Try to allocate current_layout and grab the lo_lock */
>> 		lo = nfsi->current_layout = alloc_init_layout(ino, io_ops);
>> 		spin_lock(&nfsi->lo_lock);
>>
>> 		/* release the NFS_INO_LAYOUT_ALLOC bit and wake up any waiter */
>> 		clear_bit_unlock(NFS_INO_LAYOUT_ALLOC, &nfsi->pnfs_layout_state);
>> 		wake_up_bit(&nfsi->pnfs_layout_state, NFS_INO_LAYOUT_ALLOC);
>>
>> 		/* we're done here. just check whether alloc_init_layout succeeded. */
>> 		if (lo)
>> 			break;
>>
>> 		spin_unlock(&nfsi->lo_lock);
>> 		lo = ERR_PTR(-ENOMEM);
>> 		goto err;
>> 	}
>>
>> 	dprintk("%s Return %p\n", __FUNCTION__, lo);
>> 	return lo;
>>
>> err:
>> 	dprintk("%s Return error %ld\n", __FUNCTION__, PTR_ERR(lo));
>> 	return lo;
>> }
>>
>>   
>>> Dean
>>>     
>>   
> _______________________________________________
> pNFS mailing list
> pNFS at linux-nfs.org
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
> 



More information about the pNFS mailing list