The following article has been previously published on BSD Magazine 1/2009.
The upcoming NetBSD 5.0 release will include metadata journaling on top of FFS. This feature was developed by Darrin B. Jewell for Wasabi Systems a few years ago, and it has been included in products since 2003. The code was donated to the NetBSD community earlier this year.
Named WAPBL (Write Ahead Physical Block Logging), this feature will offer various advantages, for example after crash a file system check at boot time won't be needed, in fact the log would just be replayed taking a few seconds.
We had the pleasure to talk with Simon Burge, Antti Kantee, and Greg Oster about the features and their work on WAPBL.
Could you introduce yourself?
Simon Burge: I have been working for Wasabi Systems for about eight years now, and been involved with NetBSD for about 15 years.
I originally started with NetBSD to work on the PC532, and I was doing most of the recent maintenance on this port until the start of this year when unfortunately a lack of ELF binutils for ns32k and no ns32k support in gcc4 pretty much killed it off and it was removed from NetBSD.
I've also done a lot of work with some of the MIPS ports, especially the pmax port earlier on. Now it is a pretty much a bit of anything when I get the chance.
Antti Kantee: I've been a NetBSD developer since the last millenium and have gotten my hands dirty with all the major kernel subsystems. Currently I work on my PhD thesis and consulting jobs.
Greg Oster: I have been a NetBSD developer for 10 years and while I have poked around in many parts of the kernel my primary responsibility is RAIDframe (the software RAID driver). In my day job I'm a Laboratory Systems Analyst in the Department of Computer Science at the University of Saskatchewan.
What is WAPBL?
Simon Burge: WAPBL stands for "Write Ahead Physical Block Logging". WAPBL provides metadata journaling for file systems. In particular, it is used with the fast file system (FFS) to provide rapid file system recovery after a system outage. It also provides better general-use performance over regular FFS through less on-disk metadata updates - these are coalesced in the journal.
WAPBL was developed by Wasabi Systems, and recently Wasabi contributed that work back to NetBSD. Wasabi has been using WAPBL in its storage products for about four or five years now.
How did you integrate it with FFS?
Simon Burge: Darrin Jewell did the original implementation of WAPBL.
In more recent times, Antti ported the WAPBL code from NetBSD 4.0 to NetBSD-current. Andrew helped tidy a few locking issues up with that as well. I added support for an in-filesystem journal (the original implementation used a log between the end of the filesystem and the end of the partition) and Greg helped in the discussion about how that was done. Greg has also looked after the documentation and has done a lot of testing.
What features does WAPBL provide on NetBSD-current right now (August 2008)?
Simon Burge: The two main features of WAPBL are fast file system recovery and in general increased metadata performance.
The fast file system recovery works when your system panics or loses power and doesn't shutdown cleanly. When your system restarts, any file systems with logging enabled will skip the potentially long fsck phase and WAPBL will replay any outstanding metadata transactions when the file system(s) are mounted. With the large disks of today an fsck can take half an hour or more - with WAPBL you skip this entirely!
The increased metadata performance comes from WAPBL aggregating metadata updates (any operations on directories and inodes like creating removing files) in the journal, whereas normal FFS writes each of these operations out synchronously. WAPBL is in the same ballpark as soft dependencies for most operations. The one known workload that WAPBL is slower is when the fsync(2) system call is used -- this causes the journal to be flushed to disk each time.
Do you have any benchmark result?
Simon Burge: A reasonably common benchmark used by NetBSD people is extracting pkgsrc.tar.gz. Here is the time to extract that with various mount options on one system:
normal 1.489u 12.201s 18:29.87 log 1.296u 10.531s 0:37.78 softdep 1.555u 10.015s 0:33.00 async 1.426u 9.273s 0:20.66
and "rm -r pkgsrc" times for removing that pkgsrc tree:
normal 0.115u 3.609s 9:46.81 log 0.075u 3.415s 0:14.70 softdep 0.084u 1.387s 0:15.32 async 0.125u 2.401s 0:12.29
In which contexts should WAPBL fit better? And when should we avoid it?
Simon Burge: Currently, file system snapshots (ffs(4)) do not work with WAPBL. This is being addressed and should be fixed before the next release.
In general, WAPBL should be relatively the equivalent to soft dependencies for most workloads.
The one known area where it isn't is when the fsync(2) system call is involved. Most databases use this, as well as the CVS server (but not the client). Some mailers might use this as well, so WAPBL might not suite a high volume mailserver.
How can we use it?
Simon Burge: Currently WAPBL isn't available for NetBSD 4.0 but I'm hoping to make this available "soon".
Using WAPBL is as simple as making sure you have "options WAPBL" in your kernel config file (this is the default on most architectures now), and either using "mount -o log ..." or using "rw,log" in the mount options field of your /etc/fstab.
Can we use some partitions with softupdates and some others with WAPBL?
Simon Burge: The only restriction is that you can't use both WAPBL and soft dependencies on any single file system. You are certainly free to have both active on different filesystems on one machine.
I saw on the mailing list that you are thinking at how to deal with the fact that fsck is not aware of WAPBL and might create problems. How do you plan to solve this?
Simon Burge: This is still under debate right now, so I don't have a simple answer. Note that -current fsck is aware of WAPBL. The situation we're trying to guard against is when you take a file system that has had WAPBL active on it and put it on an older system - think of say an external USB disk. We're trying to guard against people unknowingly shooting themselves in the foot, but it is not a problem that you'll run into in day-to-day running.
How much space does WAPBL require to work?
Simon Burge: An in-filesystem log is sized according to the total file system size. 1MB of log is allocated per 1GB of disk space, up to a maximum of 64MB. This is the same way that Solaris uses to determine the log size. You can use a larger log than this either by specifying a log size with tunefs before you mount the file system, or by using an end-of-partition log after the filesystem. As far as limits, there might possibly be some 32-bit limits in the log size...
Does WAPBL interefere with backup software such as dump(8)?
Simon Burge: WAPBL should be no different to soft dependencies in this respect -- they both can delay writing out metadata so there is potential for dump(8) to not catch some files if they have been recently modified in some way.
Once file system snapshots work with WAPBL (and I saw a commit go by today that should enable this but haven't looked at the details), you will be able to use "dump -X" do make a consistent backup.
How did you test and debug WAPBL?
Simon Burge: Ah, that reminds me RUMP. It allows you to run unmodified kernel code in userspace. RUMP was really quite handy when writing the code that handles in-filesystem logs with WAPBL. Instead of rebooting with a new kernel to test new code, I was just able to run a simple program, and debug any issues with gdb. It was also a lot safer working on a simple file system image in a file. I could have done this with a small file system on partitions or vnd vnode disks, but again this was much simpler with RUMP.
Greg Oster: The final stress testing had a couple of phases. The first was to run multiple, n-way simultaneous extracts of src.tar.gz, with a spacing of 10 seconds between the start of one and the start of the next. So this started with a single src.tar.gz extract, followed by two src.tar.gz extracts, all the way up through 10 simultaneous src.tar.gz extracts. This phase was repeated a few times.
The second phase of final stress testing consisted of doing continuous "./build.sh -j 8 ..." builds on freshly extracted src.tar.gz source trees (extract, build, delete, repeat). I don't recall how many build cycles were done, but the machine spent about 56 hours in this phase, all without a single issue. At that point I felt we were ready to merge WAPBL into -current.
What steps are needed to setup WAPBL via RUMP?
Antti Kantee: It requires a kernel with puffs support, either directly compiled into the kernel with "file-system PUFFS" or loaded as a module. Also, a system build with MKPUFFS=yes in /etc/mk.conf is required.
Then simply run "rump_ffs -o log device mountpoint" instead of "mount_ffs -o log device mountpoint".
Is there any plan to port WAPBL and/or RUMP to NetBSD 4.x?
Antti Kantee: The original patches supplied by Wasabi Systems were against NetBSD 4.0, so in theory WAPBL for NetBSD 4.0 exists already, although it lacks features such as in-fs log support. It took considerable effort to get WAPBL running on NetBSD-current because of the vast amount of SMP architectural changes done by Andrew Doran since NetBSD 4.0.
There are no plans to port RUMP to NetBSD 4.0. However, it should be noted, that since RUMP runs completely in userspace, it should be possible compile the RUMP code from NetBSD-current and run that on NetBSD 4.0. There may be some pitfalls, like such as libpthread on NetBSD 4.0 not supporting all the necessary routines. Most of them should be related to diagnostics and should therefore not be difficult to workaround for anyone interested in the task.
What license covers the WAPBL code provided by Wasabi?
Antti Kantee: It is available under the standard 2-clause BSD license. The copyright has been assigned to The NetBSD Foundation.
Is WAPBL suitable to small embedded systems too? Does it add too much additional work on the cpu or the disk?
Greg Oster: I haven't played with WAPBL on any embedded systems, but my understanding is that WAPBL was developed for use on Wasabi's storage products (which basically have an embedded system in them).
I've done some benchmarking, but none which would point out any overhead associated with WAPBL. I was pleasantly surprised to find that even with the journal located after the filesystem that performance issues related to seek times were basically non-existant. (at least on non-legacy hardware).
I think the short answer to the question is: Yes, WAPBL is suited for small, embedded systems, and no, it does not add significant overhead to the CPU or disk systems (at least on non-legacy hardware).
Federico Biancuzzi is a freelance interviewer. He is co-author of "Masterminds of Programming - Conversations with the Creators of Major Programming Languages", a book published by O'Reilly.