[pnfs] [PATCH 10/18] pnfs client write path no rpc nfs page cache cleanup

William A. (Andy) Adamson andros at citi.umich.edu
Tue Jan 8 15:45:45 EST 2008


On Jan 8, 2008 3:14 PM, Benny Halevy <bhalevy at panasas.com> wrote:

> William A. (Andy) Adamson wrote:
> >
> >
> > On 1/8/08, *Benny Halevy* <bhalevy at panasas.com
> > <mailto:bhalevy at panasas.com>> wrote:
> >
> >     On Jan. 07, 2008, 22:47 +0200, andros at umich.edu
> >     <mailto:andros at umich.edu> wrote:
> >     > From: Andy Adamson <andros at umich.edu <mailto:andros at umich.edu>>
> >     >
> >     > Cleanup the nfs page cache after a non rpc pNFS write
> >     >
> >     > The 2.6.18.3 <http://2.6.18.3> client called
> >     pnfs_writeback_done_norpc() from pnfs_writeback_done
> >     > to clean up after a pNFS write that used the page cache, but not
> >     NFS RPC.
> >     > pnfs_writeback_done_norpc() contained code copied from
> >     > the rpc_call_done routines nfs_writeback_done_full, and
> >     > nfs_writeback_done_partial. It ran all the code that did not
> >     have to do
> >     > with an rpc call, namely the NFS_PROTO(inode)->write_done and
> >     rpc_restart_call
> >     > called in nfs_writeback_done_partial. The PNFS_USE_FULL_CB was
> >     used to switch
> >     > between the nfs_writeback_done_full and nfs_writeback_done_partial
> >     > pieces of pnfs_writeback_done_norpc.
> >     >
> >     > Following the implementation for read, this patch provides a
> >     cleaner way of
> >     > providing the same functionlality.
> >     > The pNFS layout driver that uses the nfs page cache, but does
> >     not use
> >     > RPC sets the new flag PNFS_NO_RPC in nfs_write_data->pnfsflags.
> >     > This flag is checked in pnfs4_write_done, nfs_writeback_done, and
> >     > nfs_writeback_done_partial().
> >     >
> >     > This allows the non rpc pNFS I/O path to use the same cleanup
> >     routines
> >     > that NFS uses.
> >     >
> >     > The PNFS_USE_FULL_CB is no longer needed.
> >     >
> >     > NOTE: The 2.6.18.3 <http://2.6.18.3> code skipped the
> >     "need_commit" path in
> >     > nfs_writeback_done_partial yet calls it in
> nfs_writeback_done_full.
> >     > This patch does the same. This should be reviewed.
> >
> >     NAK on both the change to nfs_writeback_done_partial and
> >     nfs_writeback_done
> >     Managing the page cache has nothing to do with how the layout driver
> >     performed the I/O on the DS.  The upper layer cares only if and
> >     how much
> >     data was written and if it's committed to stable storage.
> >
> >
> > I can see your point with nfs_writeback_done_partial.
> >
> > Please explain why a non-RPC layout driver would want to resend and
> > RPC, which is what could happen without the change to
> > nfs_writeback_done.....
> When the layout driver has a partial layout covering only part of the
> written range
> the layout driver will effectively perform a short write and I want the
> upper layer to
> iterate forward, causing pnfs_update_layout for the remainder.
>
> The design principle behind this is that no matter how the layout driver
> performed
> the I/O it must still provide the same stable storage semantics to the
> upper layer
> so it is indistinguishable from a regular NFS I/O to the MDS.


ok. so we do something like the following, where pnfs_restart_call() is a
yet to be written layout type specific io operation:

int nfs_writeback_done()

    ........

        /* Is this a short write? */
        if (task->tk_status >= 0 && resp->count < argp->count) {
                static unsigned long    complain;

                nfs_inc_stats(data->inode, NFSIOS_SHORTWRITE);

                /* Has the server at least made some progress? */
                if (resp->count != 0) {

                        /* Was this an NFSv2 write or an NFSv3 stable write?
*/
                        if (resp->verf->committed != NFS_UNSTABLE) {
                                /* Resend from where the server left off */
                                argp->offset += resp->count;
                                argp->pgbase += resp->count;
                                argp->count -= resp->count;
                        } else {
                                /* Resend as a stable write in order to
avoid
                                 * headaches in the case of a server crash.
                                 */
                                argp->stable = NFS_FILE_SYNC;
                        }
#ifdef CONFIG_PNFS
                        if (data->pnfsflags & PNFS_NO_RPC)
                               pnfs_restart_call();
                        else
#endif /* CONFIG_PNFS */
                               rpc_restart_call(task);
                        return -EAGAIN;
                }
                if (time_before(complain, jiffies)) {
                        printk(KERN_WARNING
                               "NFS: Server wrote zero bytes, expected
%u.\n",
                                        argp->count);
                        complain = jiffies + 300 * HZ;
                }
                /* Can't do anything about it except throw an error. */
                task->tk_status = -EIO;
        }
        return 0;
}


>
> >
> > -->Andy
> <-- Benny :)
> >
> >     Benny
> >
> >     >
> >     > Signed-off by: Andy Adamson< andros at umich.edu
> >     <mailto:andros at umich.edu>>
> >     > ---
> >     >  fs/nfs/nfs4proc.c |    3 +++
> >     >  fs/nfs/pnfs.c     |   11 +++++++++--
> >     >  fs/nfs/pnfs.h     |    1 -
> >     >  fs/nfs/write.c    |   13 ++++++++-----
> >     >  4 files changed, 20 insertions(+), 8 deletions(-)
> >     >
> >     > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> >     > index bd32a1e..0fdd970 100644
> >     > --- a/fs/nfs/nfs4proc.c
> >     > +++ b/fs/nfs/nfs4proc.c
> >     > @@ -3022,6 +3022,9 @@ static int pnfs4_write_done(struct
> >     rpc_task *task, struct nfs_write_data *data)
> >     >       struct nfs_client *client = mds_svr->nfs_client;
> >     >       int status = task->tk_status;
> >     >
> >     > +     if (data->pnfsflags & PNFS_NO_RPC)
> >     > +             return 0;
> >     > +
> >     >       /* Is this a DS session */
> >     >       if (data->ds_nfs_client) {
> >     >               dprintk("%s DS read\n",__func__);
> >     > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> >     > index 5c3fc6b..28b3e7f 100644
> >     > --- a/fs/nfs/pnfs.c
> >     > +++ b/fs/nfs/pnfs.c
> >     > @@ -905,7 +905,12 @@ pnfs_set_ds_iosize(struct nfs_server *server)
> >     >       }
> >     >  }
> >     >
> >     > -/* Invoked by all non-NFSv4 I/O layout drivers to mark pages
> >     for commit
> >     > +/* Post-write completion function.  Invoked by non RPC layout
> >     drivers
> >     > + * to clean up write pages.
> >     > + *
> >     > + * NOTE: callers must set data->pnfsflags PNFS_NO_RPC
> >     > + * so that the NFS cleanup routines perform only the page cache
> >     > + * cleanup.
> >     >   */
> >     >  static void
> >     >  pnfs_writeback_done(struct nfs_write_data *data, ssize_t status)
> >     > @@ -920,7 +925,9 @@ pnfs_writeback_done(struct nfs_write_data
> >     *data, ssize_t status)
> >     >       /* Status is the number of bytes written or an error code */
> >     >       data->task.tk_status = status;
> >     >       data->res.count = status;
> >     > -     pnfs_writeback_done_norpc(&data->task, data);
> >     > +
> >     > +     /* call the NFS cleanup routines. */
> >     > +     data->call_ops->rpc_call_done(&data->task, data);
> >     >       data->call_ops->rpc_release(data);
> >     >  }
> >     >
> >     > diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> >     > index 070c23b..2baf9ae 100644
> >     > --- a/fs/nfs/pnfs.h
> >     > +++ b/fs/nfs/pnfs.h
> >     > @@ -50,7 +50,6 @@ void pnfs_set_ds_iosize(struct nfs_server
> >     *server);
> >     >  int pnfs_commit(struct inode *inode, struct list_head *head,
> >     int sync, struct nfs_write_data *data);
> >     >  int pnfs_try_to_commit(struct inode *, struct nfs_write_data *,
> >     struct list_head *, int);
> >     >  void pnfs_readpage_result_norpc(struct rpc_task *task, void
> >     *calldata);
> >     > -void pnfs_writeback_done_norpc(struct rpc_task *, void *);
> >     >  void pnfs_commit_done_norpc(struct rpc_task *, void *);
> >     >  void pnfs_pageio_init_read(struct nfs_pageio_descriptor *,
> >     struct inode *, struct nfs_open_context *, struct list_head *,
> >     size_t *);
> >     >  void pnfs_pageio_init_write(struct nfs_pageio_descriptor *,
> >     struct inode *);
> >     > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> >     > index e086ae4..eefc50c 100644
> >     > --- a/fs/nfs/write.c
> >     > +++ b/fs/nfs/write.c
> >     > @@ -1009,11 +1009,6 @@ pnfs_writeback_done_update(struct
> >     nfs_write_data *data)
> >     >               pnfs_need_layoutcommit(nfsi, argp->context);
> >     >  }
> >     >
> >     > -void pnfs_writeback_done_norpc(struct rpc_task *task, void
> >     *calldata)
> >     > -{
> >     > -     printk("%s XXX Need to Implement\n", __func__);
> >     > -}
> >     > -
> >     >  void pnfs_commit_done_norpc(struct rpc_task *task, void
> *calldata)
> >     >  {
> >     >       printk("%s XXX Need to Implement\n", __func__);
> >     > @@ -1045,7 +1040,11 @@ static void
> >     nfs_writeback_done_partial(struct rpc_task *task, void *calldata)
> >     >               goto out;
> >     >       }
> >     >
> >     > +#ifdef CONFIG_PNFS
> >     > +     if (!(data->pnfsflags & PNFS_NO_RPC) &&
> >     nfs_write_need_commit(data)) {
> >     > +#else
> >     >       if (nfs_write_need_commit(data)) {
> >     > +#endif /* CONFIG_PNFS */
> >     >               struct inode *inode = page->mapping->host;
> >     >
> >     >               spin_lock(&inode->i_lock);
> >     > @@ -1214,7 +1213,11 @@ int nfs_writeback_done(struct rpc_task
> >     *task, struct nfs_write_data *data)
> >     >               nfs_inc_stats(data->inode, NFSIOS_SHORTWRITE);
> >     >
> >     >               /* Has the server at least made some progress? */
> >     > +#ifdef CONFIG_PNFS
> >     > +             if (resp->count != 0 && !(data->pnfsflags &
> >     PNFS_NO_RPC)) {
> >     > +#else
> >     >               if (resp->count != 0) {
> >     > +#endif /* CONFIG_PNFS */
> >     >                       /* Was this an NFSv2 write or an NFSv3
> >     stable write? */
> >     >                       if (resp->verf->committed != NFS_UNSTABLE) {
> >     >                               /* Resend from where the server
> >     left off */
> >
> >     _______________________________________________
> >     pNFS mailing list
> >     pNFS at linux-nfs.org <mailto:pNFS at linux-nfs.org>
> >     http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/pnfs/attachments/20080108/1533d845/attachment-0001.htm 


More information about the pNFS mailing list