[pnfs] [PATCH 10/18] pnfs client write path no rpc nfs page cache cleanup

William A. (Andy) Adamson andros at citi.umich.edu
Wed Jan 9 09:32:50 EST 2008


On Jan 8, 2008 5:17 PM, Benny Halevy <bhalevy at panasas.com> wrote:

> William A. (Andy) Adamson wrote:
> >
> >
> > On Jan 8, 2008 3:14 PM, Benny Halevy <bhalevy at panasas.com
> > <mailto:bhalevy at panasas.com>> wrote:
> >
> >     William A. (Andy) Adamson wrote:
> >     >
> >     >
> >     > On 1/8/08, *Benny Halevy* <bhalevy at panasas.com
> >     <mailto:bhalevy at panasas.com>
> >     > <mailto: bhalevy at panasas.com <mailto:bhalevy at panasas.com>>> wrote:
> >     >
> >     >     On Jan. 07, 2008, 22:47 +0200, andros at umich.edu
> >     <mailto:andros at umich.edu>
> >     >     <mailto:andros at umich.edu <mailto:andros at umich.edu>> wrote:
> >     >     > From: Andy Adamson <andros at umich.edu
> >     <mailto:andros at umich.edu> <mailto:andros at umich.edu
> >     <mailto:andros at umich.edu>>>
> >     >     >
> >     >     > Cleanup the nfs page cache after a non rpc pNFS write
> >     >     >
> >     >     > The 2.6.18.3 <http://2.6.18.3> <http://2.6.18.3
> >     <http://2.6.18.3>> client called
> >     >     pnfs_writeback_done_norpc() from pnfs_writeback_done
> >     >     > to clean up after a pNFS write that used the page cache,
> >     but not
> >     >     NFS RPC.
> >     >     > pnfs_writeback_done_norpc() contained code copied from
> >     >     > the rpc_call_done routines nfs_writeback_done_full, and
> >     >     > nfs_writeback_done_partial. It ran all the code that did not
> >     >     have to do
> >     >     > with an rpc call, namely the NFS_PROTO(inode)->write_done
> and
> >     >     rpc_restart_call
> >     >     > called in nfs_writeback_done_partial. The PNFS_USE_FULL_CB
> was
> >     >     used to switch
> >     >     > between the nfs_writeback_done_full and
> >     nfs_writeback_done_partial
> >     >     > pieces of pnfs_writeback_done_norpc.
> >     >     >
> >     >     > Following the implementation for read, this patch provides a
> >     >     cleaner way of
> >     >     > providing the same functionlality.
> >     >     > The pNFS layout driver that uses the nfs page cache, but
> does
> >     >     not use
> >     >     > RPC sets the new flag PNFS_NO_RPC in
> >     nfs_write_data->pnfsflags.
> >     >     > This flag is checked in pnfs4_write_done,
> >     nfs_writeback_done, and
> >     >     > nfs_writeback_done_partial().
> >     >     >
> >     >     > This allows the non rpc pNFS I/O path to use the same
> cleanup
> >     >     routines
> >     >     > that NFS uses.
> >     >     >
> >     >     > The PNFS_USE_FULL_CB is no longer needed.
> >     >     >
> >     >     > NOTE: The 2.6.18.3 <http://2.6.18.3> <http://2.6.18.3>
> >     code skipped the
> >     >     "need_commit" path in
> >     >     > nfs_writeback_done_partial yet calls it in
> >     nfs_writeback_done_full.
> >     >     > This patch does the same. This should be reviewed.
> >     >
> >     >     NAK on both the change to nfs_writeback_done_partial and
> >     >     nfs_writeback_done
> >     >     Managing the page cache has nothing to do with how the
> >     layout driver
> >     >     performed the I/O on the DS.  The upper layer cares only if
> and
> >     >     how much
> >     >     data was written and if it's committed to stable storage.
> >     >
> >     >
> >     > I can see your point with nfs_writeback_done_partial.
> >     >
> >     > Please explain why a non-RPC layout driver would want to resend
> and
> >     > RPC, which is what could happen without the change to
> >     > nfs_writeback_done.....
> >     When the layout driver has a partial layout covering only part of
> the
> >     written range
> >     the layout driver will effectively perform a short write and I
> >     want the
> >     upper layer to
> >     iterate forward, causing pnfs_update_layout for the remainder.
> >
> >     The design principle behind this is that no matter how the layout
> >     driver
> >     performed
> >     the I/O it must still provide the same stable storage semantics to
> the
> >     upper layer
> >     so it is indistinguishable from a regular NFS I/O to the MDS.
> >
> >
> > ok. so we do something like the following, where pnfs_restart_call()
> > is a yet to be written layout type specific io operation:
> That seems more reasonable, but why not go through rpc_restart_call
> (with maybe a short dip
> into the layout driver just to prepare for the restart)?
>
> We're already in the middle of an rpc process from the high level point
> of view


i disagree. we haven't entered the rpc state machine, and i see no reason to
do so for a resend. the pnfs_try_to_write() call-out occurs prior to
nfs_initiate_write() which is where the rpc task is initialized and the rpc
state machine entered.


> and I think it would be for the better if we could do with as little
> pnfs'ism as possible
> with respect to overall rpc scheduling.


but in the no -rpc path there is no rpc scheduling. i see no reason to put
pnfs'isms into the rpc layer.

-->Andy


>
>
> Benny
> >
> > int nfs_writeback_done()
> >
> >     ........
> >
> >         /* Is this a short write? */
> >         if (task->tk_status >= 0 && resp->count < argp->count) {
> >                 static unsigned long    complain;
> >
> >                 nfs_inc_stats(data->inode, NFSIOS_SHORTWRITE);
> >
> >                 /* Has the server at least made some progress? */
> >                 if (resp->count != 0) {
> >
> >                         /* Was this an NFSv2 write or an NFSv3 stable
> > write? */
> >                         if (resp->verf->committed != NFS_UNSTABLE) {
> >                                 /* Resend from where the server left
> > off */
> >                                 argp->offset += resp->count;
> >                                 argp->pgbase += resp->count;
> >                                 argp->count -= resp->count;
> >                         } else {
> >                                 /* Resend as a stable write in order
> > to avoid
> >                                  * headaches in the case of a server
> > crash.
> >                                  */
> >                                 argp->stable = NFS_FILE_SYNC;
> >                         }
> > #ifdef CONFIG_PNFS
> >                         if (data->pnfsflags & PNFS_NO_RPC)
> >                                pnfs_restart_call();
> >                         else
> > #endif /* CONFIG_PNFS */
> >                                rpc_restart_call(task);
> >                         return -EAGAIN;
> >                 }
> >                 if (time_before(complain, jiffies)) {
> >                         printk(KERN_WARNING
> >                                "NFS: Server wrote zero bytes, expected
> > %u.\n",
> >                                         argp->count);
> >                         complain = jiffies + 300 * HZ;
> >                 }
> >                 /* Can't do anything about it except throw an error. */
> >                 task->tk_status = -EIO;
> >         }
> >         return 0;
> > }
> >
> >
> >
> >     >
> >     > -->Andy
> >     <-- Benny :)
> >     >
> >     >     Benny
> >     >
> >     >     >
> >     >     > Signed-off by: Andy Adamson< andros at umich.edu
> >     <mailto:andros at umich.edu>
> >     >     <mailto:andros at umich.edu <mailto:andros at umich.edu>>>
> >     >     > ---
> >     >     >  fs/nfs/nfs4proc.c |    3 +++
> >     >     >  fs/nfs/pnfs.c     |   11 +++++++++--
> >     >     >  fs/nfs/pnfs.h     |    1 -
> >     >     >  fs/nfs/write.c    |   13 ++++++++-----
> >     >     >  4 files changed, 20 insertions(+), 8 deletions(-)
> >     >     >
> >     >     > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> >     >     > index bd32a1e..0fdd970 100644
> >     >     > --- a/fs/nfs/nfs4proc.c
> >     >     > +++ b/fs/nfs/nfs4proc.c
> >     >     > @@ -3022,6 +3022,9 @@ static int pnfs4_write_done(struct
> >     >     rpc_task *task, struct nfs_write_data *data)
> >     >     >       struct nfs_client *client = mds_svr->nfs_client;
> >     >     >       int status = task->tk_status;
> >     >     >
> >     >     > +     if (data->pnfsflags & PNFS_NO_RPC)
> >     >     > +             return 0;
> >     >     > +
> >     >     >       /* Is this a DS session */
> >     >     >       if (data->ds_nfs_client) {
> >     >     >               dprintk("%s DS read\n",__func__);
> >     >     > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> >     >     > index 5c3fc6b..28b3e7f 100644
> >     >     > --- a/fs/nfs/pnfs.c
> >     >     > +++ b/fs/nfs/pnfs.c
> >     >     > @@ -905,7 +905,12 @@ pnfs_set_ds_iosize(struct nfs_server
> >     *server)
> >     >     >       }
> >     >     >  }
> >     >     >
> >     >     > -/* Invoked by all non-NFSv4 I/O layout drivers to mark
> pages
> >     >     for commit
> >     >     > +/* Post-write completion function.  Invoked by non RPC
> layout
> >     >     drivers
> >     >     > + * to clean up write pages.
> >     >     > + *
> >     >     > + * NOTE: callers must set data->pnfsflags PNFS_NO_RPC
> >     >     > + * so that the NFS cleanup routines perform only the page
> >     cache
> >     >     > + * cleanup.
> >     >     >   */
> >     >     >  static void
> >     >     >  pnfs_writeback_done(struct nfs_write_data *data, ssize_t
> >     status)
> >     >     > @@ -920,7 +925,9 @@ pnfs_writeback_done(struct
> nfs_write_data
> >     >     *data, ssize_t status)
> >     >     >       /* Status is the number of bytes written or an error
> >     code */
> >     >     >       data->task.tk_status = status;
> >     >     >       data->res.count = status;
> >     >     > -     pnfs_writeback_done_norpc(&data->task, data);
> >     >     > +
> >     >     > +     /* call the NFS cleanup routines. */
> >     >     > +     data->call_ops->rpc_call_done(&data->task, data);
> >     >     >       data->call_ops->rpc_release(data);
> >     >     >  }
> >     >     >
> >     >     > diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> >     >     > index 070c23b..2baf9ae 100644
> >     >     > --- a/fs/nfs/pnfs.h
> >     >     > +++ b/fs/nfs/pnfs.h
> >     >     > @@ -50,7 +50,6 @@ void pnfs_set_ds_iosize(struct nfs_server
> >     >     *server);
> >     >     >  int pnfs_commit(struct inode *inode, struct list_head
> *head,
> >     >     int sync, struct nfs_write_data *data);
> >     >     >  int pnfs_try_to_commit(struct inode *, struct
> >     nfs_write_data *,
> >     >     struct list_head *, int);
> >     >     >  void pnfs_readpage_result_norpc(struct rpc_task *task, void
> >     >     *calldata);
> >     >     > -void pnfs_writeback_done_norpc(struct rpc_task *, void *);
> >     >     >  void pnfs_commit_done_norpc(struct rpc_task *, void *);
> >     >     >  void pnfs_pageio_init_read(struct nfs_pageio_descriptor *,
> >     >     struct inode *, struct nfs_open_context *, struct list_head *,
> >     >     size_t *);
> >     >     >  void pnfs_pageio_init_write(struct nfs_pageio_descriptor *,
> >     >     struct inode *);
> >     >     > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> >     >     > index e086ae4..eefc50c 100644
> >     >     > --- a/fs/nfs/write.c
> >     >     > +++ b/fs/nfs/write.c
> >     >     > @@ -1009,11 +1009,6 @@ pnfs_writeback_done_update(struct
> >     >     nfs_write_data *data)
> >     >     >               pnfs_need_layoutcommit(nfsi, argp->context);
> >     >     >  }
> >     >     >
> >     >     > -void pnfs_writeback_done_norpc(struct rpc_task *task, void
> >     >     *calldata)
> >     >     > -{
> >     >     > -     printk("%s XXX Need to Implement\n", __func__);
> >     >     > -}
> >     >     > -
> >     >     >  void pnfs_commit_done_norpc(struct rpc_task *task, void
> >     *calldata)
> >     >     >  {
> >     >     >       printk("%s XXX Need to Implement\n", __func__);
> >     >     > @@ -1045,7 +1040,11 @@ static void
> >     >     nfs_writeback_done_partial(struct rpc_task *task, void
> >     *calldata)
> >     >     >               goto out;
> >     >     >       }
> >     >     >
> >     >     > +#ifdef CONFIG_PNFS
> >     >     > +     if (!(data->pnfsflags & PNFS_NO_RPC) &&
> >     >     nfs_write_need_commit(data)) {
> >     >     > +#else
> >     >     >       if (nfs_write_need_commit(data)) {
> >     >     > +#endif /* CONFIG_PNFS */
> >     >     >               struct inode *inode = page->mapping->host;
> >     >     >
> >     >     >               spin_lock(&inode->i_lock);
> >     >     > @@ -1214,7 +1213,11 @@ int nfs_writeback_done(struct
> rpc_task
> >     >     *task, struct nfs_write_data *data)
> >     >     >               nfs_inc_stats(data->inode, NFSIOS_SHORTWRITE);
> >     >     >
> >     >     >               /* Has the server at least made some
> >     progress? */
> >     >     > +#ifdef CONFIG_PNFS
> >     >     > +             if (resp->count != 0 && !(data->pnfsflags &
> >     >     PNFS_NO_RPC)) {
> >     >     > +#else
> >     >     >               if (resp->count != 0) {
> >     >     > +#endif /* CONFIG_PNFS */
> >     >     >                       /* Was this an NFSv2 write or an NFSv3
> >     >     stable write? */
> >     >     >                       if (resp->verf->committed !=
> >     NFS_UNSTABLE) {
> >     >     >                               /* Resend from where the
> server
> >     >     left off */
> >     >
> >     >     _______________________________________________
> >     >     pNFS mailing list
> >     >     pNFS at linux-nfs.org <mailto:pNFS at linux-nfs.org>
> >     <mailto:pNFS at linux-nfs.org <mailto:pNFS at linux-nfs.org>>
> >     >     http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
> >     >
> >     >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/pnfs/attachments/20080109/95441ea6/attachment-0001.htm 


More information about the pNFS mailing list