Upgrade some things

Well, I sort of realised I had a web server or two that were still on Debian Buster, and it was time to move to Bullseye or Bookworm. As usual the Debian upgrade procedure was mostly pretty straight forward and uneventful.

Interesting findings :

  • hitch“, which I use as an SSL frontend to varnish, doesn’t seem to get along all that well with systemd and silently fails if your config has “daemon = on” setting in /etc/hitch/hitch.conf. Annoyingly when trying to test the configuration with “hitch -t” you will get an error like: “No x509 certificate PEM file specified for frontend ‘default’!” – the solution to that is to specify the config file – i.e : hitch -t --config /etc/hitch/hitch.conf
  • hitch hasn’t had a release in it’s packagecloud.io repository for the last 3 years; so the debian supported variant looks more appealing.

In other news, I noticed this post where someone moaned about systemd-resolved the other day – https://www.reddit.com/r/linux/comments/18kh1r5/im_shocked_that_almost_no_one_is_talking_about/ – I’ve had similar problems to the people on the thread (resolved stops working etc) so thought it was time to try and use ‘unbound‘ instead.

apt-get install unbound

and then tell /etc/resolv.conf to use 127.0.0.1 for DNS.

annoyingly, unbound-control stats isn’t quite as pretty as resolvectl statistics but oh well.

echo -e “nameserver 127.0.0.1\nnameserver 8.8.8.8\noptions timeout:4” >/etc/resolv.conf

and an /etc/unbound/unbound.conf file that looks perhaps like :

server:
interface: 127.0.0.1
access-control: 127.0.0.0/8 allow
access-control: ::1/128 allow
# The following line will configure unbound to perform cryptographic
# DNSSEC validation using the root trust anchor.
auto-trust-anchor-file: "/var/lib/unbound/root.key"
tls-cert-bundle: "/etc/ssl/certs/ca-certificates.crt"

remote-control:
control-enable: yes
# by default the control interface is is 127.0.0.1 and ::1 and port 8953
# it is possible to use a unix socket too
control-interface: /run/unbound.ctl

forward-zone:
name: "."
forward-tls-upstream: yes
forward-addr: 1.1.1.1@853#cloudflare-dns.com
forward-addr: 1.0.0.1@853#cloudflare-dns.com

(Unfortunately my ISP is shitty, and doesn’t yet give me an ipv6 address).

Looking at https://1.1.1.1/help – I do sometimes see that ‘DNS over TLS’ is “yes”…. so I guess something is right; annoyingly I don’t see anything useful from unbound’s stats (unbound-control stats) to show it’s done a secure query…

“unbound-host” (another debian package) – will helpfully tell you whether a lookup was done ‘securely’ or not – e.g.

$ unbound-host google.com -D -v
google.com has address 142.250.178.14 (insecure)
google.com has IPv6 address 2a00:1450:4009:815::200e (insecure)
google.com mail is handled by 10 smtp.google.com. (insecure)

which seems a little odd to me (I’d have thought google would support dns sec), but some domains do work – e.g.

$ unbound-host mythic-beasts.com -D -v
mythic-beasts.com has address 93.93.130.166 (secure)
mythic-beasts.com has IPv6 address 2a00:1098:0:82:1000:0:1:2 (secure)
mythic-beasts.com mail is handled by 10 mx1.mythic-beasts.com. (secure)
mythic-beasts.com mail is handled by 10 mx2.mythic-beasts.com. (secure)

Beelink SER6 Max

“New PC Time”

I’ve had an ASUS PN50 (AMD 4800u processor) as my desktop/daily driver for sometime, and it’s nice and power efficient, but increasingly I found it being slow.

I eventually discovered I could turn on the CPU ‘boost’ feature (doh!) – but doing that seemed to result in it crashing within the next 24-48 hours…. which isn’t good. I don’t know if it’s a hardware or Linux problem – but I had already sort of decided it was time to consider upgrading to something with more ‘ooomph’.

So, I came across a slightly dodgy looking listing on Amazon for a Beelink SER6 max (32gb RAM, 500GiB SSD). The SER6 Max is a fairly new release, and Beelink are a relatively cheap, newish supplier of hardware with some past quality issues. Anyway, I thought I’d stop dithering over it, and buy it and rely on Amazon’s returns policy if there were problems with the PC/hardware.

My reason for choosing the SER6 Max was that it had enough rear ports for all three of my monitors, most other minipc variants don’t. I did contemplate the Geekom AS6 (which is an ASUS PN53 with the same CPU as this beelink, but it has slower RAM and I was concerned it might be noisy).

So, I “pulled the trigger” on https://www.amazon.co.uk/dp/B0C279T4P6 and on a whim I tried installing Siduction Linux…. so now I’ve got full disk encryption and what looks like a fairly up to date stack of stuff (with XFCE).

The SER6 has at least passed a token memory test, and some system tests – so I’m fairly optimistic about it, although I did have one hard lock up / crash yesterday which is unexplained.

(1 week later, and it seems well stable/reliable … )

bash – escaping variables for use within commands

Escaping quotes within variables is always painful in bash (somehow) – e.g.

foo”bar

and it’s not obvious that you’d need to write e.g.

“foo”\””bar”

(at least to me).

Thankfully a bash built in magical thing can be used to do the escaping for you.

In my case, I need to pass a ‘PASSWORD’ variable through to run within a container. The PASSWORD variable needs escaping so it can safely contain things like ; or quote marks (” or ‘).

e.g. docker compose run app /bin/bash "echo $PASSWORD > /some/file"

or e.g. ssh user@server “echo $PASSWORD > /tmp/something”

The fix is to use the ${PASSWORD@Q} variable syntax – for example:

#!/bin/bash

FOO=”bar’\”baz”

ssh user@server “echo $FOO > /tmp/something”

This will fail, with something like : “bash: -c: line 1: unexpected EOF while looking for matching `''

As she shell at the remote end it seeing echo bar'"baz and expects the quote mark to be closed.

So using the @Q magic –

ssh user@server “echo ${FOO@Q} > /tmp/something”

which will result in /tmp/something containing “bar'”baz” which is correct.

See also https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html#Shell-Parameter-Expansion

asus pn50 and cpufreq/boost

I’ve been using an ASUS PN50 (that’s a mini pc, with an AMD Ryzen 4800u processor – so sort of a laptop without a screen) as my desktop for ages.

Increasingly I’ve found it sluggish and I was contemplating replacing it with something newer, and then I discovered why the CPU speed in /proc/cpuinfo was always 1400mhz….

I needed to :

echo 1 > /sys/devices/system/cpu/cpufreq/boost

Once that’s done, the CPU cores can go up to about 4.2Ghz … #doh

In other news – https://www.phoronix.com/news/Linux-Per-Policy-CPUFreq-Boost looks interesting.

Unfortunately now my minipc’s fan is always speeding up / slowing down when it used to be pretty quiet :-/

Thanks to https://www.reddit.com/r/MiniPCs/comments/16cuzd8/asus_pn50_unlock_cpu_speed_under_linux/

Resizing a VM’s disk within Azure

Random notes on resizing a disk attached to an Azure VM …

Check what you have already –

az disk list --resource-group MyResourceGroup --query '[*].{Name:name,Gb:diskSizeGb,Tier:accountType}' --output table

might output something a bit like :

Name Gb
———————————————- —-
foo-os 30
bar-os 30
foo-data 512
bar-data 256

So here, we can see the ‘bar-data’ disk is only 256Gb.

Assuming you want to change it to be 512Gb (Azure doesn’t support an arbitary size, you need to choose a supported size…)

az disk update --resource-group MyResourceGroup --name bar-data --size-gb 512

Then wait a bit …

In my case, the VMs are running Debian Buster, and I see this within the ‘dmesg‘ output after the resize has completed (on the server itself).

[31197927.047562] sd 1:0:0:0: [storvsc] Sense Key : Unit Attention [current]
[31197927.053777] sd 1:0:0:0: [storvsc] Add. Sense: Capacity data has changed
[31197927.058993] sd 1:0:0:0: Capacity data has changed

Unfortunately the new size doesn’t show up straight away to the O/S, so I think you either need to reboot the VM or (what I do) –

echo 1 > /sys/class/block/sda/device/rescan

at which point the newer size appears within your ‘lsblk‘ output – and the filesystem can be resized using e.g. resize2fs

Don’t forget to defragment /home if you’re using BTRFS

As root: (as a regular user it just won’t work) –

btrfs filesystem defragment /home -r

You probably want to run that weekly.

I eventually noticed Thunderbird and phpStorm were being really slow and laggy … at which point I realised the cron job I had (as my non-root user) wasn’t working.

(using filefrag /path/to/file you can see the change in the number of extents change after defragmenting)

Excessive uptime(!?)

Somewhere on the internet there’s a mailserver with a larger uptime, I guess?

[root@xxxxxxxx ~]# uname -a
Linux xxxxxxxxxxxxxxx 2.6.18-419.el5 #1 SMP Fri Feb 24 22:47:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

[root@xxxxxxxx ~]# uptime
09:34:38 up 2290 days,  1:47,  ....

I don’t think anyone dares to reboot it …. (this is a server the customer was going to migrate off about 5 years ago …. somehow it’s still in use)

(2290 days is a little over 6 years)

btrfs & ext4 – error handling when the hardware fails …

I have a mini PC (old intel NUC) I use for taking backups of my desktop. It has a single 4TiB ssd in it.

Filesystem Type Size Used Avail Use% Mounted on
/dev/sda3 ext4 916G 80G 790G 10% /
/dev/sda4 btrfs 2.8T 106G 2.7T 4% /backup

I’ve been using btrfs for ages for /backup as I use the snapshot functionality of btrfs with an hourly rsync job from my desktop to copy changes over.

Recently the fan on the NUC failed, and while overheating (I think) it appears to have written garbage in various places (this was seen on the ext4 rootfs as well as the /backup btrfs volume).

BTRFS

Trying to scrub the filesystem highlights the problems –

root@nectarine:~# btrfs scrub status /backup
UUID:             36f93b26-6187-4874-8cc6-4d4bd092e7d8
Scrub resumed:    Sat Jun 17 13:48:33 2023
Status:           finished
Duration:         1:21:28
Total to scrub:   1.23TiB
Rate:             263.66MiB/s
Error summary:    csum=60
  Corrected:      0
  Uncorrectable:  60
  Unverified:     0

(As I only have one underlying block device, it’s not possible for it to repair itself).

I now also see messages like this in ‘dmesg’ –

[ 3570.123946] BTRFS error (device sda4): unable to fixup (regular) error at logical 1870167986176 on dev /dev/sda4
[ 3570.128866] BTRFS error (device sda4): bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 199, gen 0
[ 3570.128862] BTRFS warning (device sda4): checksum error at logical 1870167683072 on dev /dev/sda4, physical 1477245284352, root 8890, inode 3750321, offset 384077824, length 4096, links 1 (path: .icedove/e1kre066.default-release-2/ImapMail/imap.gmail-2.com/INBOX-1)

Before trying to re-initialise the checksum tree (And then just let the corrupt files expire out of the filesystem with time as they get rsync’ed over) I thought I’d try :

root@nectarine:~# btrfs check -p /dev/sda4 
Opening filesystem to check...
Checking filesystem on /dev/sda4
UUID: 36f93b26-6187-4874-8cc6-4d4bd092e7d8
[1/7] checking root items                      (0:00:10 elapsed, 6406461 items checked)
Segmentation faultents                         (0:00:02 elapsed, 7542 items checked)

So that didn’t work very well.

So I thought I might as well try just re-initialising the checksum tree –

root@nectarine:~# btrfs check -p --init-csum-tree /dev/sda4 
Creating a new CRC tree
WARNING:

	Do not use --repair unless you are advised to do so by a developer
	or an experienced user, and then only after having accepted that no
	fsck can successfully repair all types of filesystem corruption. Eg.
	some software or hardware bugs can fatally damage a volume.
	The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/sda4
UUID: 36f93b26-6187-4874-8cc6-4d4bd092e7d8
Reinitialize checksum tree
kernel-shared/extent_io.c:650: free_extent_buffer_internal: BUG_ON `eb->refs < 0` triggered, value 1
btrfs(+0x2b1f7)[0x5590e079d1f7]
btrfs(+0x2b381)[0x5590e079d381]
btrfs(+0x2b68e)[0x5590e079d68e]
btrfs(alloc_extent_buffer+0x77)[0x5590e079e740]
btrfs(read_tree_block+0x47)[0x5590e0796066]
btrfs(read_node_slot+0x47)[0x5590e078f7fd]
btrfs(btrfs_next_sibling_tree_block+0x95)[0x5590e0792900]
btrfs(+0x19e14)[0x5590e078be14]
btrfs(+0x1a8a8)[0x5590e078c8a8]
btrfs(iterate_extent_inodes+0x68)[0x5590e078d5dc]
btrfs(fill_csum_tree+0x46b)[0x5590e07f9440]
btrfs(+0x74bf2)[0x5590e07e6bf2]
btrfs(main+0x3d3)[0x5590e078a203]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff38d37fd0a]
btrfs(_start+0x2a)[0x5590e078a86a]
Aborted

So I don’t feel that worked all that well.

I guess I’ll copy off the data I don’t want to lose, and just reformat it. I was hoping the repair tools (btrfs-progs v6.2, kernel 6.1.34) had hopefully matured since I last broke a btrfs filesystem (a few years ago). I guess not?

I know btrfs is at least alerting me to issues with the data – which ext4 definitely isn’t (given /var/lib/dpkg/status contained a load of trash) – so I’ll give it credit for that. It’s just a shame the ‘repair’ tools aren’t working that well.

ext4

This isn’t written to much on this system – there’s a munin daemon running (so /var/lib/munin will have been written to) and a few log files.

Interestingly, when I first noticed a problem with the device, after logging in, I instinctively ran ‘apt-get update’ (I was hoping a reboot would fix it, at which point I might as well make sure any updates were installed).

Running ‘apt-get update’ resulted in /var/lib/dpkg/status being full of rubbish.

After the PC had been turned on for a few hours, ext4 eventually figured there were problems with it – by logging this :

[11591.230282] munin-html[22255]: segfault at a400000e ip 0000557783eaf0e9 sp 00007ffca1d969f0 error 4 in perl[557783de1000+185000] likely on CPU 3 (core 1, socket 0)
[11591.230298] Code: 4e 0c 89 56 08 83 e9 09 83 f9 01 76 14 83 fa 01 76 3f 83 ea 01 89 55 08 48 83 c4 10 5d c3 0f 1f 00 48 8b 70 08 48 85 f6 74 e3 <f6> 46 0e 10 74 dd 48 c7 40 08 00 00 00 00 8b 56 08 83 fa 01 76 22
[11591.432906] munin-graph[22257]: segfault at 55a6b77c7df0 ip 000055a64601ebc2 sp 00007ffcd88c5150 error 4 in perl[55a645fc0000+185000] likely on CPU 3 (core 1, socket 0)
[11591.432927] Code: 0f 1f 84 00 00 00 00 00 48 8b 4f 10 48 85 c9 74 5f 48 83 ec 08 48 8b 87 30 01 00 00 48 8b 50 10 48 39 d1 75 4c 48 85 f6 74 55 <48> 8b 04 f1 48 85 c0 74 20 48 8d 97 50 01 00 00 48 39 d0 74 14 8b
[12723.693630] EXT4-fs error (device sda3): htree_dirblock_to_tree:1080: inode #28706704: comm find: Directory block failed checksum
[12723.693673] Aborting journal on device sda3-8.
[12723.696920] EXT4-fs error (device sda3): ext4_journal_check_start:83: comm systemd-journal: Detected aborted journal
[12723.696945] EXT4-fs error (device sda3): ext4_journal_check_start:83: comm rs:main Q:Reg: Detected aborted journal
[12723.708257] EXT4-fs (sda3): Remounting filesystem read-only

Rebooting and running : fsck -Cy /dev/sda3 MIGHT have fixed the rootfs.

intel nuc d54250wyk (haswell) ~10 years later

This little NUC I bought ages ago is still chugging along, in continual use (albeit only as a backup ‘server’ with a large 4TiB ssd in it).

It’s recently had ‘open heart’ surgery to replace a failing fan and to clean the dust out of it (for the first time in 10 years).

Wow, it’s quiet now.

In other news, I’m tempted to buy a new desktop mini-pc to replace the ASUS PN50 I have (which seems to struggle a little, perhaps due to me having 3 monitors and it having a relatively weak graphics card).

So I’m now torn between waiting a bit longer, getting a NUC 13 Pro or ASUS PN53 or hoping BeeLink/someone release something. I’m skeptical any of the cheaper Chinese manufacturers will produce anything that’ll last >10 years though).