[pnfs] [PATCH 10/18] pnfs client write path no rpc nfs page cache cleanup
William A. (Andy) Adamson
andros at citi.umich.edu
Wed Jan 9 09:32:50 EST 2008
On Jan 8, 2008 5:17 PM, Benny Halevy <bhalevy at panasas.com> wrote:
> William A. (Andy) Adamson wrote:
> >
> >
> > On Jan 8, 2008 3:14 PM, Benny Halevy <bhalevy at panasas.com
> > <mailto:bhalevy at panasas.com>> wrote:
> >
> > William A. (Andy) Adamson wrote:
> > >
> > >
> > > On 1/8/08, *Benny Halevy* <bhalevy at panasas.com
> > <mailto:bhalevy at panasas.com>
> > > <mailto: bhalevy at panasas.com <mailto:bhalevy at panasas.com>>> wrote:
> > >
> > > On Jan. 07, 2008, 22:47 +0200, andros at umich.edu
> > <mailto:andros at umich.edu>
> > > <mailto:andros at umich.edu <mailto:andros at umich.edu>> wrote:
> > > > From: Andy Adamson <andros at umich.edu
> > <mailto:andros at umich.edu> <mailto:andros at umich.edu
> > <mailto:andros at umich.edu>>>
> > > >
> > > > Cleanup the nfs page cache after a non rpc pNFS write
> > > >
> > > > The 2.6.18.3 <http://2.6.18.3> <http://2.6.18.3
> > <http://2.6.18.3>> client called
> > > pnfs_writeback_done_norpc() from pnfs_writeback_done
> > > > to clean up after a pNFS write that used the page cache,
> > but not
> > > NFS RPC.
> > > > pnfs_writeback_done_norpc() contained code copied from
> > > > the rpc_call_done routines nfs_writeback_done_full, and
> > > > nfs_writeback_done_partial. It ran all the code that did not
> > > have to do
> > > > with an rpc call, namely the NFS_PROTO(inode)->write_done
> and
> > > rpc_restart_call
> > > > called in nfs_writeback_done_partial. The PNFS_USE_FULL_CB
> was
> > > used to switch
> > > > between the nfs_writeback_done_full and
> > nfs_writeback_done_partial
> > > > pieces of pnfs_writeback_done_norpc.
> > > >
> > > > Following the implementation for read, this patch provides a
> > > cleaner way of
> > > > providing the same functionlality.
> > > > The pNFS layout driver that uses the nfs page cache, but
> does
> > > not use
> > > > RPC sets the new flag PNFS_NO_RPC in
> > nfs_write_data->pnfsflags.
> > > > This flag is checked in pnfs4_write_done,
> > nfs_writeback_done, and
> > > > nfs_writeback_done_partial().
> > > >
> > > > This allows the non rpc pNFS I/O path to use the same
> cleanup
> > > routines
> > > > that NFS uses.
> > > >
> > > > The PNFS_USE_FULL_CB is no longer needed.
> > > >
> > > > NOTE: The 2.6.18.3 <http://2.6.18.3> <http://2.6.18.3>
> > code skipped the
> > > "need_commit" path in
> > > > nfs_writeback_done_partial yet calls it in
> > nfs_writeback_done_full.
> > > > This patch does the same. This should be reviewed.
> > >
> > > NAK on both the change to nfs_writeback_done_partial and
> > > nfs_writeback_done
> > > Managing the page cache has nothing to do with how the
> > layout driver
> > > performed the I/O on the DS. The upper layer cares only if
> and
> > > how much
> > > data was written and if it's committed to stable storage.
> > >
> > >
> > > I can see your point with nfs_writeback_done_partial.
> > >
> > > Please explain why a non-RPC layout driver would want to resend
> and
> > > RPC, which is what could happen without the change to
> > > nfs_writeback_done.....
> > When the layout driver has a partial layout covering only part of
> the
> > written range
> > the layout driver will effectively perform a short write and I
> > want the
> > upper layer to
> > iterate forward, causing pnfs_update_layout for the remainder.
> >
> > The design principle behind this is that no matter how the layout
> > driver
> > performed
> > the I/O it must still provide the same stable storage semantics to
> the
> > upper layer
> > so it is indistinguishable from a regular NFS I/O to the MDS.
> >
> >
> > ok. so we do something like the following, where pnfs_restart_call()
> > is a yet to be written layout type specific io operation:
> That seems more reasonable, but why not go through rpc_restart_call
> (with maybe a short dip
> into the layout driver just to prepare for the restart)?
>
> We're already in the middle of an rpc process from the high level point
> of view
i disagree. we haven't entered the rpc state machine, and i see no reason to
do so for a resend. the pnfs_try_to_write() call-out occurs prior to
nfs_initiate_write() which is where the rpc task is initialized and the rpc
state machine entered.
> and I think it would be for the better if we could do with as little
> pnfs'ism as possible
> with respect to overall rpc scheduling.
but in the no -rpc path there is no rpc scheduling. i see no reason to put
pnfs'isms into the rpc layer.
-->Andy
>
>
> Benny
> >
> > int nfs_writeback_done()
> >
> > ........
> >
> > /* Is this a short write? */
> > if (task->tk_status >= 0 && resp->count < argp->count) {
> > static unsigned long complain;
> >
> > nfs_inc_stats(data->inode, NFSIOS_SHORTWRITE);
> >
> > /* Has the server at least made some progress? */
> > if (resp->count != 0) {
> >
> > /* Was this an NFSv2 write or an NFSv3 stable
> > write? */
> > if (resp->verf->committed != NFS_UNSTABLE) {
> > /* Resend from where the server left
> > off */
> > argp->offset += resp->count;
> > argp->pgbase += resp->count;
> > argp->count -= resp->count;
> > } else {
> > /* Resend as a stable write in order
> > to avoid
> > * headaches in the case of a server
> > crash.
> > */
> > argp->stable = NFS_FILE_SYNC;
> > }
> > #ifdef CONFIG_PNFS
> > if (data->pnfsflags & PNFS_NO_RPC)
> > pnfs_restart_call();
> > else
> > #endif /* CONFIG_PNFS */
> > rpc_restart_call(task);
> > return -EAGAIN;
> > }
> > if (time_before(complain, jiffies)) {
> > printk(KERN_WARNING
> > "NFS: Server wrote zero bytes, expected
> > %u.\n",
> > argp->count);
> > complain = jiffies + 300 * HZ;
> > }
> > /* Can't do anything about it except throw an error. */
> > task->tk_status = -EIO;
> > }
> > return 0;
> > }
> >
> >
> >
> > >
> > > -->Andy
> > <-- Benny :)
> > >
> > > Benny
> > >
> > > >
> > > > Signed-off by: Andy Adamson< andros at umich.edu
> > <mailto:andros at umich.edu>
> > > <mailto:andros at umich.edu <mailto:andros at umich.edu>>>
> > > > ---
> > > > fs/nfs/nfs4proc.c | 3 +++
> > > > fs/nfs/pnfs.c | 11 +++++++++--
> > > > fs/nfs/pnfs.h | 1 -
> > > > fs/nfs/write.c | 13 ++++++++-----
> > > > 4 files changed, 20 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > > index bd32a1e..0fdd970 100644
> > > > --- a/fs/nfs/nfs4proc.c
> > > > +++ b/fs/nfs/nfs4proc.c
> > > > @@ -3022,6 +3022,9 @@ static int pnfs4_write_done(struct
> > > rpc_task *task, struct nfs_write_data *data)
> > > > struct nfs_client *client = mds_svr->nfs_client;
> > > > int status = task->tk_status;
> > > >
> > > > + if (data->pnfsflags & PNFS_NO_RPC)
> > > > + return 0;
> > > > +
> > > > /* Is this a DS session */
> > > > if (data->ds_nfs_client) {
> > > > dprintk("%s DS read\n",__func__);
> > > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > > > index 5c3fc6b..28b3e7f 100644
> > > > --- a/fs/nfs/pnfs.c
> > > > +++ b/fs/nfs/pnfs.c
> > > > @@ -905,7 +905,12 @@ pnfs_set_ds_iosize(struct nfs_server
> > *server)
> > > > }
> > > > }
> > > >
> > > > -/* Invoked by all non-NFSv4 I/O layout drivers to mark
> pages
> > > for commit
> > > > +/* Post-write completion function. Invoked by non RPC
> layout
> > > drivers
> > > > + * to clean up write pages.
> > > > + *
> > > > + * NOTE: callers must set data->pnfsflags PNFS_NO_RPC
> > > > + * so that the NFS cleanup routines perform only the page
> > cache
> > > > + * cleanup.
> > > > */
> > > > static void
> > > > pnfs_writeback_done(struct nfs_write_data *data, ssize_t
> > status)
> > > > @@ -920,7 +925,9 @@ pnfs_writeback_done(struct
> nfs_write_data
> > > *data, ssize_t status)
> > > > /* Status is the number of bytes written or an error
> > code */
> > > > data->task.tk_status = status;
> > > > data->res.count = status;
> > > > - pnfs_writeback_done_norpc(&data->task, data);
> > > > +
> > > > + /* call the NFS cleanup routines. */
> > > > + data->call_ops->rpc_call_done(&data->task, data);
> > > > data->call_ops->rpc_release(data);
> > > > }
> > > >
> > > > diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> > > > index 070c23b..2baf9ae 100644
> > > > --- a/fs/nfs/pnfs.h
> > > > +++ b/fs/nfs/pnfs.h
> > > > @@ -50,7 +50,6 @@ void pnfs_set_ds_iosize(struct nfs_server
> > > *server);
> > > > int pnfs_commit(struct inode *inode, struct list_head
> *head,
> > > int sync, struct nfs_write_data *data);
> > > > int pnfs_try_to_commit(struct inode *, struct
> > nfs_write_data *,
> > > struct list_head *, int);
> > > > void pnfs_readpage_result_norpc(struct rpc_task *task, void
> > > *calldata);
> > > > -void pnfs_writeback_done_norpc(struct rpc_task *, void *);
> > > > void pnfs_commit_done_norpc(struct rpc_task *, void *);
> > > > void pnfs_pageio_init_read(struct nfs_pageio_descriptor *,
> > > struct inode *, struct nfs_open_context *, struct list_head *,
> > > size_t *);
> > > > void pnfs_pageio_init_write(struct nfs_pageio_descriptor *,
> > > struct inode *);
> > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > > index e086ae4..eefc50c 100644
> > > > --- a/fs/nfs/write.c
> > > > +++ b/fs/nfs/write.c
> > > > @@ -1009,11 +1009,6 @@ pnfs_writeback_done_update(struct
> > > nfs_write_data *data)
> > > > pnfs_need_layoutcommit(nfsi, argp->context);
> > > > }
> > > >
> > > > -void pnfs_writeback_done_norpc(struct rpc_task *task, void
> > > *calldata)
> > > > -{
> > > > - printk("%s XXX Need to Implement\n", __func__);
> > > > -}
> > > > -
> > > > void pnfs_commit_done_norpc(struct rpc_task *task, void
> > *calldata)
> > > > {
> > > > printk("%s XXX Need to Implement\n", __func__);
> > > > @@ -1045,7 +1040,11 @@ static void
> > > nfs_writeback_done_partial(struct rpc_task *task, void
> > *calldata)
> > > > goto out;
> > > > }
> > > >
> > > > +#ifdef CONFIG_PNFS
> > > > + if (!(data->pnfsflags & PNFS_NO_RPC) &&
> > > nfs_write_need_commit(data)) {
> > > > +#else
> > > > if (nfs_write_need_commit(data)) {
> > > > +#endif /* CONFIG_PNFS */
> > > > struct inode *inode = page->mapping->host;
> > > >
> > > > spin_lock(&inode->i_lock);
> > > > @@ -1214,7 +1213,11 @@ int nfs_writeback_done(struct
> rpc_task
> > > *task, struct nfs_write_data *data)
> > > > nfs_inc_stats(data->inode, NFSIOS_SHORTWRITE);
> > > >
> > > > /* Has the server at least made some
> > progress? */
> > > > +#ifdef CONFIG_PNFS
> > > > + if (resp->count != 0 && !(data->pnfsflags &
> > > PNFS_NO_RPC)) {
> > > > +#else
> > > > if (resp->count != 0) {
> > > > +#endif /* CONFIG_PNFS */
> > > > /* Was this an NFSv2 write or an NFSv3
> > > stable write? */
> > > > if (resp->verf->committed !=
> > NFS_UNSTABLE) {
> > > > /* Resend from where the
> server
> > > left off */
> > >
> > > _______________________________________________
> > > pNFS mailing list
> > > pNFS at linux-nfs.org <mailto:pNFS at linux-nfs.org>
> > <mailto:pNFS at linux-nfs.org <mailto:pNFS at linux-nfs.org>>
> > > http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
> > >
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/pnfs/attachments/20080109/95441ea6/attachment-0001.htm
More information about the pNFS
mailing list