On Friday my NX desktop started playing up a little. So I rebooted. Kernel root VFS mount error! Very nice. After some testing, it was the first time I’d seen a faulty HDD bring down the Linux kernel even when booting from a Live CD rescue disk. Damn!
Luckily, I was using NFS home directories and avoided “where is that month old backup” mode.
When I first started using Xen, I thought I’d be clever and devolve all services from dom0 to the domUs. This included the NFS file service running in a Base domU. Centralising the mail spool, various web roots and databases instances to this Base, making backups and other management easier. By moving everything including NFS services from the dom0 I intended to improve the security profile, restricting dom0 to purely domU guest management and dns caching. All the other external services would then run in separate domUs as appropriate to the desired security envelop.
On this colo machine for instance, I have a Base domU providing NFS to a mail domU running smtp and imap. Then for personal mail use I run mutt on a shell domU. The Base also provides web roots to other domUs and runs mysql instances. All this is being driven by a single 2.8GHz Xeon. I’m not running a high load desktop domU and not pushing the hardware.
One of the first problems I noticed was periodically mutt would stall. Locking up, the interface acting like it was connected via an old 2400 baud link. The stalled session could be restarted quickly by suspending, killing and restarting mutt. It was annoying but I lived with it.
Later in June I started working on my first NX desktop experiment. Great I thought, Xen makes it easy to built a domU snapshot, break my desktop, and then revert without touching the bare metal. Plus I can have one physical machine and not mess up my desktop OS with databases and random libraries to get some interesting application running.
I pulled out an old P4 1.8GHz, added 2Gb of memory and installed with Xen using the same basic template from the colo above. A Base domU providing NFS, particularily a NFS home directory to the desktop domU. As indicated elsewhere in this blog, the FreeNX installation from source was flawlessly. During the following week I experienced a similar annoyance, this time with the whole desktop stalling. Firefox would periodically act like Netscape waiting for a dns request. Occasionally the whole NX desktop would halt and I’d have to wait for a few moments before it started operating again.
As I was busy and didn’t have time to investigate it much, my initially thoughts were it was a single processor contention problem. So I pull out my old desktop machine installed a bare-metal, without Xen, Ubuntu and FreeNX. Then pointing the home directory via NFS back over the network to the initial Base domU. This time there were less problems. So for the following two months I was happily working on Ubuntu via NX. Running large OpenOffice 2.0 and Firefox processes. Suspending my laptop in the evening, then restarting in the morning right back were I was the prior evening via NX resume. Almost productivity heaven.
In the meantime a few weeks ago, I pulled the HDD from the P4 1.8GHz which was now purely a server and put it into a spare 2.4GHz Xeon SMP machine. I had the intention of finding a free weekend and testing SMP as a solution to the performance issues. Of course, being Linux I didn’t even have to bother recompiling the kernel.
Then of course on Friday the HDD failed on my desktop machine. I guess this was meant to be this weekend. Nothing like a hardware fault to kill your fun.
I quickly resurrected the desktop LVM image I’d first used. Did a quick apt-get dist-upgrade. Pointed the NFS home directory back at the base domU on the same machine and started trying out ‘xm pincpu’ in the host dom0. Same desktop stalling. Even with base and desktop domUs on completely different CPUs. Double Damn!
Later in that day while thinking about something else, I had a “I am stupid brainwave” moment. Realising I had been silly in continuing to force over-optimising in the security envelop. NFS Net I/O was stalling when the base domU guest ran out of CPU slice time while requesting LVM Disk I/O via the dom0 block device.
So I moved the NFS files service into the dom0 host.
Without much issue I shutdown all the domains dependent on the NFS service plus the base domU guest. Removed the LVM home and export slices from the base domU guest. Edited all the fstab entries and restarted. The work of 10 minutes, including the time debugging a typo. Linux/Xen is great.
Pow! Things started work as they should. The domU based desktop started working flawless, backed by NFS home directories on the xen0 NFS host. For the last couple day I’ve been back in NX thin client heaven.
Moral of this story. NFS home directories are good, but not from a domU.