btrfs & ext4 – error handling when the hardware fails …

I have a mini PC (old intel NUC) I use for taking backups of my desktop. It has a single 4TiB ssd in it.

Filesystem Type Size Used Avail Use% Mounted on
/dev/sda3 ext4 916G 80G 790G 10% /
/dev/sda4 btrfs 2.8T 106G 2.7T 4% /backup

I’ve been using btrfs for ages for /backup as I use the snapshot functionality of btrfs with an hourly rsync job from my desktop to copy changes over.

Recently the fan on the NUC failed, and while overheating (I think) it appears to have written garbage in various places (this was seen on the ext4 rootfs as well as the /backup btrfs volume).

BTRFS

Trying to scrub the filesystem highlights the problems –

root@nectarine:~# btrfs scrub status /backup
UUID:             36f93b26-6187-4874-8cc6-4d4bd092e7d8
Scrub resumed:    Sat Jun 17 13:48:33 2023
Status:           finished
Duration:         1:21:28
Total to scrub:   1.23TiB
Rate:             263.66MiB/s
Error summary:    csum=60
  Corrected:      0
  Uncorrectable:  60
  Unverified:     0

(As I only have one underlying block device, it’s not possible for it to repair itself).

I now also see messages like this in ‘dmesg’ –

[ 3570.123946] BTRFS error (device sda4): unable to fixup (regular) error at logical 1870167986176 on dev /dev/sda4
[ 3570.128866] BTRFS error (device sda4): bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 199, gen 0
[ 3570.128862] BTRFS warning (device sda4): checksum error at logical 1870167683072 on dev /dev/sda4, physical 1477245284352, root 8890, inode 3750321, offset 384077824, length 4096, links 1 (path: .icedove/e1kre066.default-release-2/ImapMail/imap.gmail-2.com/INBOX-1)

Before trying to re-initialise the checksum tree (And then just let the corrupt files expire out of the filesystem with time as they get rsync’ed over) I thought I’d try :

root@nectarine:~# btrfs check -p /dev/sda4 
Opening filesystem to check...
Checking filesystem on /dev/sda4
UUID: 36f93b26-6187-4874-8cc6-4d4bd092e7d8
[1/7] checking root items                      (0:00:10 elapsed, 6406461 items checked)
Segmentation faultents                         (0:00:02 elapsed, 7542 items checked)

So that didn’t work very well.

So I thought I might as well try just re-initialising the checksum tree –

root@nectarine:~# btrfs check -p --init-csum-tree /dev/sda4 
Creating a new CRC tree
WARNING:

	Do not use --repair unless you are advised to do so by a developer
	or an experienced user, and then only after having accepted that no
	fsck can successfully repair all types of filesystem corruption. Eg.
	some software or hardware bugs can fatally damage a volume.
	The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/sda4
UUID: 36f93b26-6187-4874-8cc6-4d4bd092e7d8
Reinitialize checksum tree
kernel-shared/extent_io.c:650: free_extent_buffer_internal: BUG_ON `eb->refs < 0` triggered, value 1
btrfs(+0x2b1f7)[0x5590e079d1f7]
btrfs(+0x2b381)[0x5590e079d381]
btrfs(+0x2b68e)[0x5590e079d68e]
btrfs(alloc_extent_buffer+0x77)[0x5590e079e740]
btrfs(read_tree_block+0x47)[0x5590e0796066]
btrfs(read_node_slot+0x47)[0x5590e078f7fd]
btrfs(btrfs_next_sibling_tree_block+0x95)[0x5590e0792900]
btrfs(+0x19e14)[0x5590e078be14]
btrfs(+0x1a8a8)[0x5590e078c8a8]
btrfs(iterate_extent_inodes+0x68)[0x5590e078d5dc]
btrfs(fill_csum_tree+0x46b)[0x5590e07f9440]
btrfs(+0x74bf2)[0x5590e07e6bf2]
btrfs(main+0x3d3)[0x5590e078a203]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff38d37fd0a]
btrfs(_start+0x2a)[0x5590e078a86a]
Aborted

So I don’t feel that worked all that well.

I guess I’ll copy off the data I don’t want to lose, and just reformat it. I was hoping the repair tools (btrfs-progs v6.2, kernel 6.1.34) had hopefully matured since I last broke a btrfs filesystem (a few years ago). I guess not?

I know btrfs is at least alerting me to issues with the data – which ext4 definitely isn’t (given /var/lib/dpkg/status contained a load of trash) – so I’ll give it credit for that. It’s just a shame the ‘repair’ tools aren’t working that well.

ext4

This isn’t written to much on this system – there’s a munin daemon running (so /var/lib/munin will have been written to) and a few log files.

Interestingly, when I first noticed a problem with the device, after logging in, I instinctively ran ‘apt-get update’ (I was hoping a reboot would fix it, at which point I might as well make sure any updates were installed).

Running ‘apt-get update’ resulted in /var/lib/dpkg/status being full of rubbish.

After the PC had been turned on for a few hours, ext4 eventually figured there were problems with it – by logging this :

[11591.230282] munin-html[22255]: segfault at a400000e ip 0000557783eaf0e9 sp 00007ffca1d969f0 error 4 in perl[557783de1000+185000] likely on CPU 3 (core 1, socket 0)
[11591.230298] Code: 4e 0c 89 56 08 83 e9 09 83 f9 01 76 14 83 fa 01 76 3f 83 ea 01 89 55 08 48 83 c4 10 5d c3 0f 1f 00 48 8b 70 08 48 85 f6 74 e3 <f6> 46 0e 10 74 dd 48 c7 40 08 00 00 00 00 8b 56 08 83 fa 01 76 22
[11591.432906] munin-graph[22257]: segfault at 55a6b77c7df0 ip 000055a64601ebc2 sp 00007ffcd88c5150 error 4 in perl[55a645fc0000+185000] likely on CPU 3 (core 1, socket 0)
[11591.432927] Code: 0f 1f 84 00 00 00 00 00 48 8b 4f 10 48 85 c9 74 5f 48 83 ec 08 48 8b 87 30 01 00 00 48 8b 50 10 48 39 d1 75 4c 48 85 f6 74 55 <48> 8b 04 f1 48 85 c0 74 20 48 8d 97 50 01 00 00 48 39 d0 74 14 8b
[12723.693630] EXT4-fs error (device sda3): htree_dirblock_to_tree:1080: inode #28706704: comm find: Directory block failed checksum
[12723.693673] Aborting journal on device sda3-8.
[12723.696920] EXT4-fs error (device sda3): ext4_journal_check_start:83: comm systemd-journal: Detected aborted journal
[12723.696945] EXT4-fs error (device sda3): ext4_journal_check_start:83: comm rs:main Q:Reg: Detected aborted journal
[12723.708257] EXT4-fs (sda3): Remounting filesystem read-only

Rebooting and running : fsck -Cy /dev/sda3 MIGHT have fixed the rootfs.

fsck paranoid?

Some random hints :

  1. Ensure the final field / column in /etc/fstab is non-zero for other filesystems you have mounted; if it’s 0 then fsck will never run on them.
  2. fsck -Cccy /dev/blah1 does a read-write (non-destructive test). Works well on SSDs 🙂
Example from /etc/fstab:
/dev/md0  /mount/point ext3 defaults 0 2

When looking at the various boxes we have in our office, I found one server had the following (run dumpe2fs /dev/whatever1):

  • Mount count:              62
  • Maximum mount count:      39
  • Last checked:             Wed Jul  9 16:09:17 2008
  • Next check after:         Mon Jan  5 15:09:17 2009
Today is 8th June 2012. Ooops.

Interestingly when I did run fsck on it, there were no errors. Is perhaps the default ext3 setting of checking every 20-30 mounts too paranoid?  It’s certainly very painful running fsck on large ‘rotating’ volumes – waiting over an hour for a server to come up is not fun.

 

fsck -y (or fsck yes…)

Tip for the day:

Edit /etc/default/rcS on Debian/Ubuntu servers, and set FSCKFIX=yes (default of no) so next time your server runs fsck at startup, and spends hours doing it to only moan when it finds an error, and tells you to waste more time by running fsck with ‘-y’ (to fix it).

Quite why fsck even asks me to say ‘yes’ to it’s “do you want to fix…?” questions is another matter. It really annoys me – it’s not like I have enough information to make any sort of informed choice – it’s not like I know the structure that’s present on the disk, in order to say “No! You’re wrong FSCK, inode xxxxx should be blahblahblah”. I can only say ‘yes’.

If only FSCKFIX=yes was the default setting….. I can’t imagine Windows or OSX ever asking someone to answer such questions.