TCP and timeouts

Jijo Chacko consultant.bangalore at gmail.com
Thu Nov 29 00:41:01 EST 2007


Thanks Chuck; To me, what ORACLE does to compliment NFS is like moving the
core function of a distributed files system( which is cache coherence for
concurrent data access at a write system call level transparent to
applications, providing UNIX like local file system semantics) to
applications which is not the best way from an industry perspective.(This is
very much like applications do their own congestion control on top of UDP,
like RTPoUDP). But like QuickIO offered by VxFS or many file systems for
data bases to bypass in-kernel file systems caches, this trend(application
level cache coherence) for certain applications , over a long period of time
to come, merits a re-look, atleast for new applications to emerge.

Atleast as pNFS/V4.1 looks very promising, providing a direct Data path for
Data clients,   i hope that New networked applications should  be able to
perform con-current IO on top of the next generation NFS without needing to
use posix-style locks. This is very imperative, for the special  cases that
a load-balancer, front-ending  file access clients, back-ending data servers
that use a shared file systems. Web switches (For example Alteon from
Nortel) now use their own cookie-based cache persistence.. We can make the
design of such devices(or the next generation distributed data bases) very
simpler, if we have an NFS that has fine-granular cache consistence protocol
built-in.

Thanks again Chuck, for your interest in extening our conversations
resulting  in a real educative experience.... Will catch you later, thanks
and bye.


On 11/28/07, Chuck Lever <chuck.lever at oracle.com> wrote:
>
> On Nov 28, 2007, at 8:23 AM, Jijo Chacko wrote:
> > One of the long-standing issue with NFS (and still remained un-
> > resolved, others can correct me if  i am wrong) that made it un-
> > sutable for ORACLE like high transactional environment  is that it
> > doesn't support cache coherence at a write system call level. It
> > only  provide open-to-close semantics. Though v4 has delegation, it
> > only let the clients do aggressive caching(and if it is revoked, it
> > gets back to the v3 style). IN that environment, can  an ORACLE
> > table be modified, from mutliple RAC servers, concurrently ?
>
> NFS clients support several different types of cache coherency.  By
> default, NFS clients treat files with close-to-open cache coherency.
> However, you can use the "noac" mount option to refine the coherency
> behavior quite a bit.  NLM locking, as implemented on Linux, also
> significantly increases cache coherency by invalidating a file's read
> cache when it is locked, and flushing writes to the server when it is
> unlocked (a kind of super close-to-open coherency, where locking is
> open and unlocking is close).
>
> In the end, the most cache coherence you can get is no caching at
> all, and some NFS clients provide the ability to do this using
> direct, or uncached I/O.  On Solaris you can use the "forcedirectio"
> mount option; on Linux, opening a file with the O_DIRECT flag
> disables read and write caching for that file.
>
> Oracle RAC runs quite successfully on Linux (and Solaris, too, I
> believe) with NFS and direct I/O.  Oracle's OnDemand database
> business, for example, is based entirely on Linux, using NFS with
> NetApp filers.
>
> As an historical note, database on NFS was problematic not because of
> cache coherency, but write performance.  Cache coherency has become
> an issue only recently because of the advent of database clustering
> solutions such as RAC.  Write performance was an issue in NFSv2
> because every requested write had to be flushed all the way to disk
> before the server could reply that the write had completed.  This
> issue drove the invention of NVRAM write acceleration for NFS
> servers, and eventually UNSTABLE writes became a part of NFSv3 to
> help alleviate write performance problems in NFSv2.
>
> > ORACLE may be solving this problem with its locks at distributed
> > database layer. But we cannot expect NFS do this , as will be done
> > by file systems like GPFS or IBM's SANFS.
>
> Oracle does use a distributed locking layer for RAC, but disables
> file system data caching to achieve cache coherence.  As Oracle uses
> a shared buffer cache at the user level, disabling the file system
> level cache actually improves performance because physical memory
> that was being used by the file system cache can be repurposed.
>
> I might add that GPFS or SANFS continue to cache and use a cluster
> locking protocol in this case, which adds unnecessary overhead since
> the database itself is already providing a DLM and is caching data
> itself.
>
> > Does V4.1 solve this?
>
> It doesn't, but it isn't designed to.
>
> > or any idea about any NFS incarnation that support true cache
> > coherence?
>
> File system clustering came into vogue nearly 15 years after NFS was
> first designed.  Applications that demand tight cache coherence and
> don't implement it themselves are certainly better served by more
> modern solutions, but NFS still fills a significant niche by
> providing basic and standardized file sharing services that are
> available in a wide variety of operating systems.
>
> > On Nov 28, 2007 11:01 AM, Mike Eisler < email2mre-
> > linuxv4 at yahoo.com> wrote:
> >
> > > -----Original Message-----
> > > From: Jijo Chacko [mailto:consultant.bangalore at gmail.com]
> >
> > >    This means that  servers drop  RPC packets at  RPC
> > > layer, and this is very likely
> > >    in high work-load situations. But is NFS is used in that
> >
> > Generally servers don't drop requests received over TCP. It
> > is considered very bad form, and banned in NFSv4.
> >
> > > environment ?  does ORACLE run
> > >    on top of NFS ? While there are Distributed FS like GPFS,
> >
> > Yes, Oracle database can and does use NFS.
> >
> > > SAN FS,  anybody use NFS
> > >    for this purpose ? Yes as you mentioned, if RPC is to drop
> >
> > Many people.
> > http://www.google.com/search?&q=nfs+site%3Aoracle.com&btnG=Search
> >
> >
> > > the packets at application
> > >    layer, there is no way to blame TCP, and we have to use
> > > RPC timeouts. Is the v4.1 Session protocol has some sort of
> > > rpc-level flow control as well ? There are products from SUN,
> >
> > Yes the session component of the NFSv4.1 protocol has flow control.
> >
> >
> > _______________________________________________
> > NFSv4 mailing list
> > NFSv4 at linux-nfs.org
> > http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4
> >
> > _______________________________________________
> > NFSv4 mailing list
> > NFSv4 at linux-nfs.org
> > http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/nfsv4/attachments/20071129/4007185b/attachment.htm 


More information about the NFSv4 mailing list