Linode: don’t use barriers and ext-4

On my Linode running Gentoo Linux, I converted to ext4 some time ago and didn’t have any issues until now for some reason, mostly because I don’t reboot that often to notice. The symptoms are :

Every reboot, you will see:

EXT4-fs error (device xvda): ext4_journal_start_sb:296: Detected aborted journal
EXT4-fs (xvda): Remounting filesystem read-only

and a subsequent reboot fixes this by a forced run of fsck. Now that is an annoying one, every other reboot results in a crippled system and otherwise a fsck “fixes” it and you have no issues.

So, after some research I found that barriers are enabled by default and they don’t really make sense on a hosted vm guest. I qualify the last statement by google research, not an expert but it seems that the common knowledge is that disabling barriers is safe for battery backed up storage, and Linode disabled barriers completly.

The solution to the above problem is simply disabling barriers. Like so:

In /etc/fstab:
/dev/xvda / ext4 noatime,barrier=0 0 1

Source: Linode Forums

Leave a comment ?

12 Comments.

  1. Hmm, what kernel are you using? Something is definitely wrong. You should be able to safely use barriers with a VM. The barriers get passed down to the host and make sure data that needs to be synced (i.e. metadata) is synced as intended. Linode heavily over-subscribes on disk I/O so it might hurt certain performance (i.e. DB) but that’s another problem altogether.

    Running w/o barriers on a production machine is a decision that few people are expert enough to make. Under the right kind of system crash or a failure of the RAID card you could easily hose 100s of MB of data. The safest maxim might be something like, are you willing to risk total file system corruption at any given time? If yes, you can reap the performance benefit. Otherwise, leave it on.

    • I tried every 3.x kernel that Linode had to offer. Do you think I make this up? I know I’m not an expert, but..even others have the same problem.

      • I wasn’t being snide or saying you were making it up – that’s what the “something is definitely wrong” part is for. The other stuff is just so people don’t do something without realizing the implications. This is a legit bug that should be fixed.

        Can you send in dmesg, e2fsprogs version, etc to linux-ext4 mailing list and CC me?

        • I’m interested because I use Linode personally and Xen professionally. This could be a big problem, or it could be something bad Linode is doing like not honoring barriers at the hypervisor level to game some more performance and it doesn’t play well with the reboot. Either way, it’s worth finding out.

        • Nope, I’m not going to send that stuff to linux-ext4 mailing list – I’m not qualified to participate on that mailing list nor do I have proper access to the host system(s) details. Linode disabled barriers by default in “all” their kernels and I posted this because I know a number of Gentoo users that might want to review before rebooting to a surprise, like I did.

  2. Hm, this looks awfully familiar. I wonder if I had a related problem. My local gentoo system would every time I boot tell me that the journal isn’t empty while it should be (*). I then had to manually run fsck to correct the non-empty journal (at the next boot it would work, after that it was the same story all over). I thought my hd might be failing and so copied all my data but didn’t have time to investigate the issue any further…

    (*) I believe the exact message was “needs_recovery flag is clear but journal contains data”

  3. Is this still an issue? Do I still need barrier=0 on ext4/gentoo/linode?

    My kernel is Latest 3.5 (3.5.2-linode45) currently.

Leave a Reply