[pnfs] Fwd: Kernel crash on umount

William A. (Andy) Adamson andros at citi.umich.edu
Fri Mar 9 09:25:04 EST 2007


---------- Forwarded message ----------
From: William A. (Andy) Adamson <andros at citi.umich.edu>
Date: Mar 9, 2007 9:24 AM
Subject: Re: [pnfs] Kernel crash on umount
To: "Iyer, Rahul" <Rahul.Iyer at netapp.com>



On 3/8/07, Iyer, Rahul <Rahul.Iyer at netapp.com> wrote:
>
> I guess the problem is more than just a reference count on the rpc_clnt.
> The MDS and the DS have two different clientids (because of the
> EXCHANGE_ID to the data server) and 2 different sessions as well. I'm
> not so sure that using the same nfs4_client struct for both these
> purposes is a good idea.


the clientid for the DS will never be used, because there is no
OPEN/LOCK/DELEGATION state associated with it. we can throw it away once a
session is established.

-->Andy


Regards
> Rahul
>
>
> > -----Original Message-----
> > From: William A. (Andy) Adamson [mailto: andros at citi.umich.edu]
> > Sent: Thursday, March 08, 2007 7:44 AM
> > To: Iyer, Rahul
> > Cc: pnfs at linux-nfs.org
> > Subject: Re: [pnfs] Kernel crash on umount
> >
> > hi rahul
> >
> > the way struct nfs_client is used changes between 2.6.18 and
> > 2.6.20. the problem you state with MDS/DS sharing a struct
> > nfs_client is similar to a referral in that like a referral,
> > the DS needs to keep it's own state (unlike a referral, it
> > keeps the rpc_clnt).
> >
> > anyway, you may want to wait to solve this problem until
> > after the upgrade.
> >
> > as far as the panic on umount, this is a solved problem in
> > linus git-tree where __nfs4_find_client inc's struct
> > nfs_client->cl_count.
> >
> > -->Andy
> >
> >
> > On 3/8/07, Iyer, Rahul <Rahul.Iyer at netapp.com > wrote:
> >
> >       Hi,
> >       Today, I noticed that the pNFS client causes the client
> > kernel to crash on an umount. I have narrowed it down to
> > these pieces of code.
> >
> >       On the mount, after the GETDEVICELIST is called, for
> > each device in the list, device_create() is called. As the
> > code snippet below shows, this makes a call to
> > nfs4_get_client via the get_client pointer in
> > server->rpc_ops. nfs4_get_client tries to find a nfs4_client
> > struct for the given target ip address. If one exists, it is
> > returned, else a new one is created.
> >
> >       device_create(struct nfs_server *server, struct
> > nfs4_pnfs_dev_item *dev)
> >       {
> >               struct nfs4_client   *clp;
> >               struct rpc_xprt      *xprt;
> >               struct sockaddr_in    sin;
> >               struct rpc_clnt      *mds_rpc = server->client;
> >               int err = 0;
> >
> >               sin.sin_family = AF_INET;
> >               sin.sin_addr.s_addr = dev->ip_addr;
> >               sin.sin_port = dev->port;
> >
> >               clp = server->rpc_ops->get_client(&sin.sin_addr);
> >               if (!clp) {
> >                       err = PTR_ERR(clp);
> >                       dprintk("%s: failed to create NFS4
> > client err %d\n",
> >                               __FUNCTION__, err);
> >                       goto out;
> >               }
> >
> >               dprintk("device_create: dev_id=%u, ip=%x,
> > port=%hu\n", dev->dev_id, ntohl(dev->ip_addr), ntohs(dev->port));
> >
> >               xprt = xprt_create_proto(IPPROTO_TCP, &sin,
> >                                        &mds_rpc->cl_xprt->timeout);
> >               if (IS_ERR(xprt)) {
> >                       err = PTR_ERR(xprt);
> >                       goto out;
> >               }
> >
> >               clp->cl_rpcclient = create_nfs_rpcclient(xprt,
> > "nfs4_pnfs_dserver", mds_rpc->cl_vers,
> > mds_rpc->cl_auth->au_flavor, &err);
> >               if (clp->cl_rpcclient == NULL) {
> >                       printk("%s: Can't create nfs rpc
> > client!\n", __FUNCTION__);
> >                       goto out;
> >               }
> >                       .....
> >
> >
> >       Now, the problem is that if the MDS is colocated with a
> > DS (a single server acting as both MDS and DS), a common
> > nfs4_client struct is shared by the NFS client and the
> > device. This causes problems during umount. It also causes a
> > memory leak of the original rpc_clnt that the nfs4_client had.
> >
> >       static void nfs4_kill_super(struct super_block *sb)
> >       {
> >               struct nfs_server *server = NFS_SB(sb);
> >
> >               nfs_return_all_delegations(sb);
> >
> >               /* Unmount the layout driver */
> >               unmount_pnfs_layoutdriver(sb);
> >
> >               kill_anon_super(sb);
> >
> >                if ((server ->nfs4_state) &&
> > (server->nfs4_state->cl_minorversion == 1)) {
> >                        dprintk("calling destroy session for
> > superblock %p for client %p\n", server, server->nfs4_state);
> >                       nfs4_proc_destroy_session(server->nfs4_state);
> >                }
> >
> >               nfs4_renewd_prepare_shutdown(server);
> >
> >               if (server->client != NULL && !IS_ERR(server->client))
> >                       rpc_shutdown_client(server->client);
> >
> >       ....
> >
> >       As seen above, the call to unmount_pnfs_layoutdriver()
> > shuts down all rpc clients for all the devices. In the
> > colocated case, it kills the common rpc connection used by
> > the nfs4_client struct shared by the NFS client and the
> > device. Hence, when it tries to do a DESTROY_SESSION through
> > nfs4_proc_destroy_session(), the rpc connection has been
> > shutdown. Hence it causes a kernel crash (NULL pointer dereference).
> >
> >       Apart from this obvious problem, I'm also concerned
> > about sharing the nfs4_client. My major concern is that these
> > are going to be 2 totally separate clientids (because of the
> > exchange_id to the DS) and so, not the best idea.
> >
> >       I'll take a shot at fixing this tomorrow.
> >       Regards
> >
> >       Rahul
> >
> >
> >       _______________________________________________
> >       pNFS mailing list
> >       pNFS at linux-nfs.org
> >       http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
> > <http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs >
> >
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/pnfs/attachments/20070309/bc2038e0/attachment-0001.htm 


More information about the pNFS mailing list