[pnfs] Kernel crash on umount
Iyer, Rahul
Rahul.Iyer at netapp.com
Thu Mar 8 04:55:24 EST 2007
Hi,
Today, I noticed that the pNFS client causes the client kernel to crash
on an umount. I have narrowed it down to these pieces of code.
On the mount, after the GETDEVICELIST is called, for each device in the
list, device_create() is called. As the code snippet below shows, this
makes a call to nfs4_get_client via the get_client pointer in
server->rpc_ops. nfs4_get_client tries to find a nfs4_client struct for
the given target ip address. If one exists, it is returned, else a new
one is created.
device_create(struct nfs_server *server, struct nfs4_pnfs_dev_item *dev)
{
struct nfs4_client *clp;
struct rpc_xprt *xprt;
struct sockaddr_in sin;
struct rpc_clnt *mds_rpc = server->client;
int err = 0;
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = dev->ip_addr;
sin.sin_port = dev->port;
clp = server->rpc_ops->get_client(&sin.sin_addr);
if (!clp) {
err = PTR_ERR(clp);
dprintk("%s: failed to create NFS4 client err %d\n",
__FUNCTION__, err);
goto out;
}
dprintk("device_create: dev_id=%u, ip=%x, port=%hu\n",
dev->dev_id, ntohl(dev->ip_addr), ntohs(dev->port));
xprt = xprt_create_proto(IPPROTO_TCP, &sin,
&mds_rpc->cl_xprt->timeout);
if (IS_ERR(xprt)) {
err = PTR_ERR(xprt);
goto out;
}
clp->cl_rpcclient = create_nfs_rpcclient(xprt,
"nfs4_pnfs_dserver", mds_rpc->cl_vers, mds_rpc->cl_auth->au_flavor,
&err);
if (clp->cl_rpcclient == NULL) {
printk("%s: Can't create nfs rpc client!\n",
__FUNCTION__);
goto out;
}
.....
Now, the problem is that if the MDS is colocated with a DS (a single
server acting as both MDS and DS), a common nfs4_client struct is shared
by the NFS client and the device. This causes problems during umount. It
also causes a memory leak of the original rpc_clnt that the nfs4_client
had.
static void nfs4_kill_super(struct super_block *sb)
{
struct nfs_server *server = NFS_SB(sb);
nfs_return_all_delegations(sb);
/* Unmount the layout driver */
unmount_pnfs_layoutdriver(sb);
kill_anon_super(sb);
if ((server ->nfs4_state) &&
(server->nfs4_state->cl_minorversion == 1)) {
dprintk("calling destroy session for superblock %p for
client %p\n", server, server->nfs4_state);
nfs4_proc_destroy_session(server->nfs4_state);
}
nfs4_renewd_prepare_shutdown(server);
if (server->client != NULL && !IS_ERR(server->client))
rpc_shutdown_client(server->client);
....
As seen above, the call to unmount_pnfs_layoutdriver() shuts down all
rpc clients for all the devices. In the colocated case, it kills the
common rpc connection used by the nfs4_client struct shared by the NFS
client and the device. Hence, when it tries to do a DESTROY_SESSION
through nfs4_proc_destroy_session(), the rpc connection has been
shutdown. Hence it causes a kernel crash (NULL pointer dereference).
Apart from this obvious problem, I'm also concerned about sharing the
nfs4_client. My major concern is that these are going to be 2 totally
separate clientids (because of the exchange_id to the DS) and so, not
the best idea.
I'll take a shot at fixing this tomorrow.
Regards
Rahul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/pnfs/attachments/20070308/901e33fc/attachment.htm
More information about the pNFS
mailing list