[pnfs] Kernel crash on umount

Iyer, Rahul Rahul.Iyer at netapp.com
Thu Mar 8 16:43:56 EST 2007


I guess the problem is more than just a reference count on the rpc_clnt.
The MDS and the DS have two different clientids (because of the
EXCHANGE_ID to the data server) and 2 different sessions as well. I'm
not so sure that using the same nfs4_client struct for both these
purposes is a good idea.
Regards
Rahul


> -----Original Message-----
> From: William A. (Andy) Adamson [mailto:andros at citi.umich.edu] 
> Sent: Thursday, March 08, 2007 7:44 AM
> To: Iyer, Rahul
> Cc: pnfs at linux-nfs.org
> Subject: Re: [pnfs] Kernel crash on umount
> 
> hi rahul
> 
> the way struct nfs_client is used changes between 2.6.18 and 
> 2.6.20. the problem you state with MDS/DS sharing a struct 
> nfs_client is similar to a referral in that like a referral, 
> the DS needs to keep it's own state (unlike a referral, it 
> keeps the rpc_clnt). 
> 
> anyway, you may want to wait to solve this problem until 
> after the upgrade.
> 
> as far as the panic on umount, this is a solved problem in 
> linus git-tree where __nfs4_find_client inc's struct 
> nfs_client->cl_count. 
> 
> -->Andy
> 
> 
> On 3/8/07, Iyer, Rahul <Rahul.Iyer at netapp.com> wrote:
> 
> 	Hi,
> 	Today, I noticed that the pNFS client causes the client 
> kernel to crash on an umount. I have narrowed it down to 
> these pieces of code.
> 	 
> 	On the mount, after the GETDEVICELIST is called, for 
> each device in the list, device_create() is called. As the 
> code snippet below shows, this makes a call to 
> nfs4_get_client via the get_client pointer in 
> server->rpc_ops. nfs4_get_client tries to find a nfs4_client 
> struct for the given target ip address. If one exists, it is 
> returned, else a new one is created. 
> 	 
> 	device_create(struct nfs_server *server, struct 
> nfs4_pnfs_dev_item *dev)
> 	{
> 	        struct nfs4_client   *clp;
> 	        struct rpc_xprt      *xprt;
> 	        struct sockaddr_in    sin;
> 	        struct rpc_clnt      *mds_rpc = server->client;
> 	        int err = 0;
> 	 
> 	        sin.sin_family = AF_INET;
> 	        sin.sin_addr.s_addr = dev->ip_addr;
> 	        sin.sin_port = dev->port;
> 	 
> 	        clp = server->rpc_ops->get_client(&sin.sin_addr);
> 	        if (!clp) {
> 	                err = PTR_ERR(clp);
> 	                dprintk("%s: failed to create NFS4 
> client err %d\n",
> 	                        __FUNCTION__, err);
> 	                goto out;
> 	        }
> 	
> 	        dprintk("device_create: dev_id=%u, ip=%x, 
> port=%hu\n", dev->dev_id, ntohl(dev->ip_addr), ntohs(dev->port));
> 	 
> 	        xprt = xprt_create_proto(IPPROTO_TCP, &sin,
> 	                                 &mds_rpc->cl_xprt->timeout);
> 	        if (IS_ERR(xprt)) {
> 	                err = PTR_ERR(xprt);
> 	                goto out;
> 	        }
> 	 
> 	        clp->cl_rpcclient = create_nfs_rpcclient(xprt, 
> "nfs4_pnfs_dserver", mds_rpc->cl_vers, 
> mds_rpc->cl_auth->au_flavor, &err);
> 	        if (clp->cl_rpcclient == NULL) {
> 	                printk("%s: Can't create nfs rpc 
> client!\n", __FUNCTION__);
> 	                goto out;
> 	        }
> 	                .....
> 	 
> 	 
> 	Now, the problem is that if the MDS is colocated with a 
> DS (a single server acting as both MDS and DS), a common 
> nfs4_client struct is shared by the NFS client and the 
> device. This causes problems during umount. It also causes a 
> memory leak of the original rpc_clnt that the nfs4_client had.
> 	 
> 	static void nfs4_kill_super(struct super_block *sb)
> 	{
> 	        struct nfs_server *server = NFS_SB(sb);
> 	 
> 	        nfs_return_all_delegations(sb);
> 	
> 	        /* Unmount the layout driver */
> 	        unmount_pnfs_layoutdriver(sb);
> 	 
> 	        kill_anon_super(sb);
> 	 
> 	         if ((server ->nfs4_state) && 
> (server->nfs4_state->cl_minorversion == 1)) {
> 	                 dprintk("calling destroy session for 
> superblock %p for client %p\n", server, server->nfs4_state);
> 	                nfs4_proc_destroy_session(server->nfs4_state);
> 	         } 
> 	 
> 	        nfs4_renewd_prepare_shutdown(server);
> 	 
> 	        if (server->client != NULL && !IS_ERR(server->client))
> 	                rpc_shutdown_client(server->client);
> 	 
> 	....
> 	 
> 	As seen above, the call to unmount_pnfs_layoutdriver() 
> shuts down all rpc clients for all the devices. In the 
> colocated case, it kills the common rpc connection used by 
> the nfs4_client struct shared by the NFS client and the 
> device. Hence, when it tries to do a DESTROY_SESSION through 
> nfs4_proc_destroy_session(), the rpc connection has been 
> shutdown. Hence it causes a kernel crash (NULL pointer dereference).
> 	 
> 	Apart from this obvious problem, I'm also concerned 
> about sharing the nfs4_client. My major concern is that these 
> are going to be 2 totally separate clientids (because of the 
> exchange_id to the DS) and so, not the best idea. 
> 	 
> 	I'll take a shot at fixing this tomorrow.
> 	Regards
> 	
> 	Rahul
> 	 
> 
> 	_______________________________________________
> 	pNFS mailing list
> 	pNFS at linux-nfs.org
> 	http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs 
> <http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs> 
> 	
> 
> 
> 


More information about the pNFS mailing list