Script to fix NFS (Debian Squeeze + Backports bits)

I have a NFS server running Debian Squeeze. Additionally it’s using the 3.2.x kernel from backports, and the nfs-kernel-server from backports too.

Sometimes NFS breaks, and gives helpful messages like :

mount.nfs: connection timed out

or just:

Stale NFS handle on clients.

 

While I’m confident that my /etc/exports and other configuration files are correct, it still insists on misbehaving.

Below is a random shell script I seem to have created to fix the NFS server –

#!/bin/bash
set -e
/etc/init.d/nfs-kernel-server stop
/etc/init.d/nfs-common stop
/etc/init.d/rpcbind stop

rm -Rf /var/lib/nfs
mkdir /var/lib/nfs
mkdir /var/lib/nfs/v4recovery /var/lib/nfs/rpc_pipefs

for f in /var/lib/nfs/etab \
/var/lib/nfs/rmtab \
/var/lib/nfs/xtab; do
[ -e $f ] || touch $f
done

/etc/init.d/rpcbind start
sleep 2
/etc/init.d/nfs-common start
sleep 2
/etc/init.d/nfs-kernel-server start

echo "NFS may now work"

exportfs -f

Yes… “NFS may now work” … that sums it up about right.

netstat –tcp -lp output not showing a process id

I often use ‘netstat –tcp -lpn’ to display a list of open ports on a server – so i can check things aren’t listening where they shouldn’t be (e.g. MySQL accepting connections from the world) and so on. Obviously I firewall boxes; but I like to have a reasonable default incase the firewall decides to flush itself randomly or whatever.

Anyway, I ran ‘netstat –tcp -lpn’ and saw something like the following :

tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN      3355/mysqld     
tcp        0      0 0.0.0.0:54283           0.0.0.0:*               LISTEN      -               
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1940/portmap

Now ‘mysqld’ looks OK – and portmap does (well, I need it on this box). But what on earth was listening on port 54283, and why is there no process name/pid attached to it?

After lots of rummaging, and paranoia where I thought perhaps the box had been rooted, I discovered it was from an NFS mount (which explains the lack of a pid, as it’s kernel based).

lsof -i tcp:54283

Didn’t help either. Unmounting the NFS filesystem did identify the problem – and the entry went away.