problem during configuring High Availability NFS Server

Sadananda Tripathy tripathy at noida.atrenta.com
Tue Mar 13 04:33:51 EDT 2007


Hi,

I have faced problem during configuring High Availability (DRBD + 
Heartbeat) NFS Server.

*Problem in brief: *

When active NFS server goes down another server becomes active(As it is 
a Active Passive clustering), the new requests for the export area works 
fine but the clients which are connected to the previous server gives 
the error like "Stale NFS file handle".
Again when I  down the current active server and make the previous 
server as active, then the clients which are giving "Stale NFS file 
handle" error, starts working fine. But the clients which are connected 
to the other server  starts giving "Stale NFS file handle" error.

 

*My setup is as bellow...*
OS: Red Hat Enterprise Linux 4 (Update 1) [kernel  2.6.9-11.ELsmp]
DRBD version:  drbd-0.7.23
Heartbeat: heartbeat-2.0.8
NFS: nfs-utils-1.0.6-46
DRBD volume: /dev/drbd0 mounted on /home
Heartbeat  configuration  file:
# cat /usr/local/etc/ha.d/haresources
drbd2 drbddisk::r0 Filesystem::/dev/drbd0::/home::xfs 192.168.2.209 nfs 
nfslock
 
DRBD with Heartbeat is working fine. (i.e. when one node goes down the 
other node become active. It mounts the DRBD volume on /home and  set 
the service IP address (192.168.2.209) on eth0:0)

*Problem description in details:*
I have moved nfs folder from /var/lib to /home (# cd /var/lib; mv nfs 
/home/)
And make a link of the same at /var/lib. (# cd /var/lib; ln --s 
/home/nfs nfs)

As /home area is /dev/drbd0( i.e. it is in sync in both nodes), 
/var/lib/nfs is same for both the nodes.
I have two nodes Named drbd1 and drbd2. Both have same configuration as 
above .

 
At drbd1:
[root at drbd1 ~]# df -lh
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             487M  158M  304M  35% /
none                  501M     0  501M   0% /dev/shm
/dev/sda5             2.0G   36M  1.8G   2% /tmp
/dev/sda3             8.7G  3.8G  4.5G  46% /usr
/dev/sda6             2.0G   92M  1.8G   5% /var
/dev/drbd0             60G  3.7M   60G   1% /home
#cd  /var/lib
[root at drbd1 lib]# ls -l nfs
lrwxrwxrwx  1 root root 9 Mar  9 14:23 nfs -> /home/nfs

[root at drbd1 lib]# ls -l nfs/
total 12
-rw-r--r--  1 root    root    139 Mar 13 12:46 etab
-rw-r--r--  1 root    root     98 Mar 13 04:03 rmtab
drwxr-xr-x  7 root    root      0 Mar 13 12:46 rpc_pipefs
-rw-r--r--  1 root    root      0 Mar  9 20:01 stat
drwx------  4 rpcuser rpcuser  40 Nov 30  2004 statd
-rw-------  1 root    root      0 Nov 30  2004 state
-rw-r--r--  1 root    root      0 Nov 30  2004 xtab

 From a client  I mount the /home Area of the cluster (when drbd1 is 
active).
# mount  192.168.2.209: /home /mnt
#ls --l  /mnt
total 4
drwxr-xr-x  2 root root 47 Mar  7 18:14 ldap
drwxr-xr-x  4 root root 92 Mar 13 12:46 nfs
drwxrwxrwx  3 root root 25 Mar  9 20:40 test

Now I have down the active cluster server (drbd1), then other server 
(drbd2) becomes active.
[root at drbd2 src]# df -lh
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             587M  207M  351M  38% /
none                  501M     0  501M   0% /dev/shm
/dev/sda5             2.0G   36M  1.8G   2% /tmp
/dev/sda3             8.7G  5.9G  2.4G  72% /usr
/dev/sda6             2.0G  202M  1.7G  11% /var
/dev/drbd0             60G  3.7M   60G   1% /home

#cd  /var/lib
[root at drbd2 lib]# ls -l nfs
lrwxrwxrwx  1 root root 9 Mar  9 13:42 nfs -> /home/nfs

[root at drbd2 lib]# ls -l nfs/
total 12
-rw-r--r--  1 root    root    139 Mar 13 13:04 etab
-rw-r--r--  1 root    root     98 Mar 13 04:03 rmtab
drwxr-xr-x  7 root    root      0 Mar  8 19:12 rpc_pipefs
-rw-r--r--  1 root    root      0 Mar  9 20:01 stat
drwx------  4 rpcuser rpcuser  40 Nov 30  2004 statd
-rw-------  1 root    root      0 Nov 30  2004 state
-rw-r--r--  1 root    root      0 Nov 30  2004 xtab 
But from the client when I try to access my mounted area, I get the 
following error.

# ls --l  /mnt
ls: /mnt: Stale NFS file handle
But Please note If I mount the area again (i.e. #mount 
192.168.2.209:/home /mnt) then it works fine.
Or if I down the drbd2 server , then drbd1 server becomes active and the 
client can continue the old access also.

#ls --l  /mnt
total 4
drwxr-xr-x  2 root root 47 Mar  7 18:14 ldap
drwxr-xr-x  4 root root 92 Mar 13 12:46 nfs
drwxrwxrwx  3 root root 25 Mar  9 20:40 test


If any one have any clue for the above problem please help me.
Thanks and Regards,
Sadananda

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/nfsv4/attachments/20070313/8f68ebb0/attachment.htm 


More information about the NFSv4 mailing list