Wednesday, February 16, 2011

Oracle and ZFS shenanigans

With the latest Solaris 10 release or recommended patch cluster, there are significant updates provided for ZFS. By patching or reinstalling with Solaris 10 9/10, you can get close to the zpool version which was previously only available by using Solaris Express Community Edition, now Solaris 11 Express.

But there is a potential catch. Each incremental feature change to zpool capabilities causes the zpool version number to be incremented. You can see what versions are supported on your local install by typing:

zpool upgrade -v

which lists all available features on the current driver, along with their version number. After applying the most recent Solaris 10 patch cluster, you'll see the following:

cara:~> zpool upgrade -v
This system is currently running ZFS pool version 22.

The following versions are supported:

VER DESCRIPTION
--- --------------------------------------------------------
1 Initial ZFS version
2 Ditto blocks (replicated metadata)
3 Hot spares and double parity RAID-Z
4 zpool history
5 Compression using the gzip algorithm
6 bootfs pool property
7 Separate intent log devices
8 Delegated administration
9 refquota and refreservation properties
10 Cache devices
11 Improved scrub performance
12 Snapshot properties
13 snapused property
14 passthrough-x aclinherit
15 user/group space accounting
16 stmf property support
17 Triple-parity RAID-Z
18 Snapshot user holds
19 Log device removal
20 Compression using zle (zero-length encoding)
21 Reserved
22 Received properties

Note that version 21 is 'Reserved'. If you run the same command on a system running the Express kernel, version 21 shows as:

21 Deduplication
22 Received properties
23 Slim ZIL


The whole point of zpool versioning is that a pool with a given version number should be mountable on any system running ZFS where the kernel supports at least that version of the pool. Sun went to great lengths to enable this, even specifying that ZFS was endian-independent, where all writes would be done with the local byte order, but reads would be honored in either big or little endian order. You can move a pool from a SPARC to an x86 platform, and it works.

This was going to be a blog post about the evils of Oracle Corporation breaking this compatibility. Version 21 is deduplication on the Express version, but reserved on the release version. I was going to rant about the dangers of creating a filesystem utilizing deduplication using Solaris Express, then trying to import it into a release version of Solaris 10.

But I can't quite do that.

After performing an experiment, it seems that Solaris 10 can in fact correctly mount pools created on Express which have deduplication enabled. However, 10 won't continue to dedup newly written data, since that's not a supported feature. This at least makes sense as a compromise. Compatibility is preserved across pool versions, to the extent that you won't see any nasty side effects like kernel panics if you accidentally mount a deduped filesystem on a release version of Solaris 10. You won't get any further benefit from this unsupported feature, but it shouldn't kill you either.

So my only question is this: is the dedup feature left out of these updates because Oracle wants to provide a compelling reason to move to Solaris 11 (which may also feature significantly different license terms)? Or are they leaving it out because there's a concern about bugs which impact integrity, availability, or both in the current version of the software?

Time will tell.

Saturday, February 12, 2011

Forcing a Linux Reboot

Linux zfs-fuse is an extremely useful piece of software, but this morning it crashed on me to the point where even 'reboot -f' was failing to reboot the server due to kernel confusion.

Fortunately, there is a way to use the Linux SysRq mechanism to force an immediate reboot. This won't sync the disks, and certainly won't wait for processes to terminate (which is why it works in this case), but it certainly saved me from going to the data center to manually intervene.

To do an emergency reboot on Linux, perform the following two steps as root:

echo 1 > /proc/sys/kernel/sysrq
echo b > /proc/sysrq-trigger

This causes an immediate reboot of the system. Of course, if the thing causing the problem was a corrupted root filesystem, the server may not boot, but that would be the case regardless :-)

More details on SysRq are available at the Linux Kernel site.