Sunday, 29 August 2010
Noflushd Patch for 2.6.32+ Kernels
After a recent kernel upgrade (from 2.6.26 to 2.6.34), Noflushd[1] stopped working. Worse, it broke quite badly in that CPU usage spiked as soon as sync's started to happen and in all probability some sort of a continuous loop seems to have kicked in. It appears that Noflushd relies on writing a "0" into the /proc/sys/vm/dirty_writeback_centisecs file which results in the new pdflush (threaded) implementation essentially continuously writing (since the write daemon now sleeps for "0" seconds between writes, and spawns multiple threads, one per partition/mount-point, to do a write). There are a lot of side effects of the pdflush implementation changes and noflushd is one that is quite severely affected.
Noflushd is not the preferred choice for spindowns when you factor in the availability of laptop_mode and the capabilities of hard disks to spindown when there is no activity (as configured via 'hdparm -S ...'). Unfortunately if one uses a Western Digital Scorpio Blue Notebook drive (on Linux), many (most? Or worse, all?) drives seem to have "broken" firmware in that they either spindown extremely aggressively (~8s) or not at all. They blatantly ignore hdparm values and if you factor in the loadcycle ("head parking") that also happens at the same rate of ~8s, there's a very strong likelihood that the drive will die fairly quickly. So, in the unfortunate event you have similar drive behaviour, short of writing a periodic spindown command job (which would not be aware of writes and their benefits on resetting timers) it is best to rely on Noflushd.
After a fair bit of tweaking and debugging, here's a set of changes (see below) that makes Noflushd work as it is supposed to, like with the 2.6.26 (and older) kernels. Essentially these changes also hinge on having the old EXT3 behaviour (journal mode is ordered instead of the now-default writeback) - so make sure that you are using EXT3 with the right behaviour (since this too has changed in the recent kernels).
First a bit of background on the new pdflush implementation. Essentially the /proc/sys/vm/dirty_writeback_centisecs file has a changed meaning (or rather behaviour) when it is set at zero. In older kernels (including 2.6.26), a zero in dirty_writeback_centisecs meant that the background flush daemon was disabled (i.e. not woken up periodically to flush writes to disk). This was crucial in using noflushd correctly to prevent unnecessary spinups of the drives since noflushd used that mechanism to disable the background writes before forcing the hard disk to sleep/standby. In the new kernels, instead of a "0", a "-1" seems to disable the background writes completely (and also results in "correct" behaviour when using noflushd as the harddisk correctly spins down and in general works as before). A fair bit of background on the pdflush changes (not necessarily related to the "0" vs "-1") are here [2,3,4,5].
Disabling the writeback daemon/threads using a "-1" in /proc/sys/vm/dirty_writeback_centisecs (instead of "0" that worked before) requires a few key changes to noflushd (especially when you want it to also work correctly with the older kernels that still write a "0"). One more interesting issue is that if /proc/sys/vm/laptop_mode has a non-zero value in it, the kernel will force a full sync that many seconds after any other write (including a noflushd sync), which will result in a forced disk write and wakeup the disk if it is sleeping. This is a big deviation from the past where a (forced) sync would not force a fresh write later. As a result, to get the disk to spindown correctly (using noflushd), it is essential to disable laptop_mode. Of course all this this is irrelvant if you have a well behaved disk to begin with.
So to support both the new 2.6.32+ kernels and the older kernels, noflushd now respects a "-1" in the dirty_writeback_centisecs entry and uses it accordingly. The patch (below) also has some other useful changes as well. Noflushd will now track how long drives have been spun down as well as how many times it has been spun up/down along with average duration of spin downs. The interval (as specified on the command line) is now based on 10s intervals (so 5 means 50s) instead of the original minute based spec. The new spec allows much finer grained control of the spin down times. The statistics are dumped (via syslog entries) on shutdown as well as when a SIGUSR1 signal is received. Additionally when a SIGHUP is received, the next (default) timeout value is used and that also is logged to syslog making the workings of noflushd much clearer to review and tweak the time settings.
In summary to have noflushd work correctly on the 2.6.32+ kernels, the following changes have to be made to the system configuration:
- Disable laptop mode: /proc/sys/vm/laptop_mode should have "0"
- Disable writeback expiry: /proc/sys/vm/dirty_writeback_centisecs should have "-1" (this is a change from the past when this was "0" to disable writebacks and some other (>0) value to have periodic flushes).
- Use the patched noflushd: Noflushd has now been patched to work correctly with the new kernels ONLY if the write back expiry value is "-1". Otherwise it will default to the old behaviour of writing a "0" in /proc/sys/vm/dirty_writeback_centisecs (which will break in the newer kernels). Caveat emptor. YMMV.
- Disable the automatic hard disk spindowns/tweaks : Use 'hdparm -B
128' (or higher) to disable automatic spindowns of the hard disk. A value of
254 will also disable the auto-head park feature, resulting in a controlled and
correct loadcount increase for the hard disk head.
Note: These hdparm values are specific to the Western Digital Scorpio Blue Series Npotebook IDE drives. - Other Notes: Noflushd works in conjunction with all the other caveats applicable to delayed writes and laptop mode. See [6] for more information other applicable caveats before relying on noflushd.
Download the patch for noflushd here.
Keywords: linux kernel 2.6.32 2.6.26 writeback expiry dirty disk spindown hdparm pdflush noflushd western digital scorpio blue spindown firmware
URL[1]: http://noflushd.sourceforge.net/
URL[2]: http://lwn.net/Articles/326552/
URL[3]: http://axboe.livejournal.com/1819.html
URL[4]: http://axboe.livejournal.com/2258.html
URL[5]: http://lwn.net/Articles/9521/
URL[6]: http://www.kernel.org/doc/Documentation/laptops/laptop-mode.txt