Resizing a VM’s disk within Azure

Random notes on resizing a disk attached to an Azure VM …

Check what you have already –

az disk list --resource-group MyResourceGroup --query '[*].{Name:name,Gb:diskSizeGb,Tier:accountType}' --output table

might output something a bit like :

Name Gb
———————————————- —-
foo-os 30
bar-os 30
foo-data 512
bar-data 256

So here, we can see the ‘bar-data’ disk is only 256Gb.

Assuming you want to change it to be 512Gb (Azure doesn’t support an arbitary size, you need to choose a supported size…)

az disk update --resource-group MyResourceGroup --name bar-data --size-gb 512

Then wait a bit …

In my case, the VMs are running Debian Buster, and I see this within the ‘dmesg‘ output after the resize has completed (on the server itself).

[31197927.047562] sd 1:0:0:0: [storvsc] Sense Key : Unit Attention [current]
[31197927.053777] sd 1:0:0:0: [storvsc] Add. Sense: Capacity data has changed
[31197927.058993] sd 1:0:0:0: Capacity data has changed

Unfortunately the new size doesn’t show up straight away to the O/S, so I think you either need to reboot the VM or (what I do) –

echo 1 > /sys/class/block/sda/device/rescan

at which point the newer size appears within your ‘lsblk‘ output – and the filesystem can be resized using e.g. resize2fs

Don’t forget to defragment /home if you’re using BTRFS

As root: (as a regular user it just won’t work) –

btrfs filesystem defragment /home -r

You probably want to run that weekly.

I eventually noticed Thunderbird and phpStorm were being really slow and laggy … at which point I realised the cron job I had (as my non-root user) wasn’t working.

(using filefrag /path/to/file you can see the change in the number of extents change after defragmenting)

Excessive uptime(!?)

Somewhere on the internet there’s a mailserver with a larger uptime, I guess?

[root@xxxxxxxx ~]# uname -a
Linux xxxxxxxxxxxxxxx 2.6.18-419.el5 #1 SMP Fri Feb 24 22:47:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

[root@xxxxxxxx ~]# uptime
09:34:38 up 2290 days,  1:47,  ....

I don’t think anyone dares to reboot it …. (this is a server the customer was going to migrate off about 5 years ago …. somehow it’s still in use)

(2290 days is a little over 6 years)

btrfs & ext4 – error handling when the hardware fails …

I have a mini PC (old intel NUC) I use for taking backups of my desktop. It has a single 4TiB ssd in it.

Filesystem Type Size Used Avail Use% Mounted on
/dev/sda3 ext4 916G 80G 790G 10% /
/dev/sda4 btrfs 2.8T 106G 2.7T 4% /backup

I’ve been using btrfs for ages for /backup as I use the snapshot functionality of btrfs with an hourly rsync job from my desktop to copy changes over.

Recently the fan on the NUC failed, and while overheating (I think) it appears to have written garbage in various places (this was seen on the ext4 rootfs as well as the /backup btrfs volume).

BTRFS

Trying to scrub the filesystem highlights the problems –

root@nectarine:~# btrfs scrub status /backup
UUID:             36f93b26-6187-4874-8cc6-4d4bd092e7d8
Scrub resumed:    Sat Jun 17 13:48:33 2023
Status:           finished
Duration:         1:21:28
Total to scrub:   1.23TiB
Rate:             263.66MiB/s
Error summary:    csum=60
  Corrected:      0
  Uncorrectable:  60
  Unverified:     0

(As I only have one underlying block device, it’s not possible for it to repair itself).

I now also see messages like this in ‘dmesg’ –

[ 3570.123946] BTRFS error (device sda4): unable to fixup (regular) error at logical 1870167986176 on dev /dev/sda4
[ 3570.128866] BTRFS error (device sda4): bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 199, gen 0
[ 3570.128862] BTRFS warning (device sda4): checksum error at logical 1870167683072 on dev /dev/sda4, physical 1477245284352, root 8890, inode 3750321, offset 384077824, length 4096, links 1 (path: .icedove/e1kre066.default-release-2/ImapMail/imap.gmail-2.com/INBOX-1)

Before trying to re-initialise the checksum tree (And then just let the corrupt files expire out of the filesystem with time as they get rsync’ed over) I thought I’d try :

root@nectarine:~# btrfs check -p /dev/sda4 
Opening filesystem to check...
Checking filesystem on /dev/sda4
UUID: 36f93b26-6187-4874-8cc6-4d4bd092e7d8
[1/7] checking root items                      (0:00:10 elapsed, 6406461 items checked)
Segmentation faultents                         (0:00:02 elapsed, 7542 items checked)

So that didn’t work very well.

So I thought I might as well try just re-initialising the checksum tree –

root@nectarine:~# btrfs check -p --init-csum-tree /dev/sda4 
Creating a new CRC tree
WARNING:

	Do not use --repair unless you are advised to do so by a developer
	or an experienced user, and then only after having accepted that no
	fsck can successfully repair all types of filesystem corruption. Eg.
	some software or hardware bugs can fatally damage a volume.
	The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/sda4
UUID: 36f93b26-6187-4874-8cc6-4d4bd092e7d8
Reinitialize checksum tree
kernel-shared/extent_io.c:650: free_extent_buffer_internal: BUG_ON `eb->refs < 0` triggered, value 1
btrfs(+0x2b1f7)[0x5590e079d1f7]
btrfs(+0x2b381)[0x5590e079d381]
btrfs(+0x2b68e)[0x5590e079d68e]
btrfs(alloc_extent_buffer+0x77)[0x5590e079e740]
btrfs(read_tree_block+0x47)[0x5590e0796066]
btrfs(read_node_slot+0x47)[0x5590e078f7fd]
btrfs(btrfs_next_sibling_tree_block+0x95)[0x5590e0792900]
btrfs(+0x19e14)[0x5590e078be14]
btrfs(+0x1a8a8)[0x5590e078c8a8]
btrfs(iterate_extent_inodes+0x68)[0x5590e078d5dc]
btrfs(fill_csum_tree+0x46b)[0x5590e07f9440]
btrfs(+0x74bf2)[0x5590e07e6bf2]
btrfs(main+0x3d3)[0x5590e078a203]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff38d37fd0a]
btrfs(_start+0x2a)[0x5590e078a86a]
Aborted

So I don’t feel that worked all that well.

I guess I’ll copy off the data I don’t want to lose, and just reformat it. I was hoping the repair tools (btrfs-progs v6.2, kernel 6.1.34) had hopefully matured since I last broke a btrfs filesystem (a few years ago). I guess not?

I know btrfs is at least alerting me to issues with the data – which ext4 definitely isn’t (given /var/lib/dpkg/status contained a load of trash) – so I’ll give it credit for that. It’s just a shame the ‘repair’ tools aren’t working that well.

ext4

This isn’t written to much on this system – there’s a munin daemon running (so /var/lib/munin will have been written to) and a few log files.

Interestingly, when I first noticed a problem with the device, after logging in, I instinctively ran ‘apt-get update’ (I was hoping a reboot would fix it, at which point I might as well make sure any updates were installed).

Running ‘apt-get update’ resulted in /var/lib/dpkg/status being full of rubbish.

After the PC had been turned on for a few hours, ext4 eventually figured there were problems with it – by logging this :

[11591.230282] munin-html[22255]: segfault at a400000e ip 0000557783eaf0e9 sp 00007ffca1d969f0 error 4 in perl[557783de1000+185000] likely on CPU 3 (core 1, socket 0)
[11591.230298] Code: 4e 0c 89 56 08 83 e9 09 83 f9 01 76 14 83 fa 01 76 3f 83 ea 01 89 55 08 48 83 c4 10 5d c3 0f 1f 00 48 8b 70 08 48 85 f6 74 e3 <f6> 46 0e 10 74 dd 48 c7 40 08 00 00 00 00 8b 56 08 83 fa 01 76 22
[11591.432906] munin-graph[22257]: segfault at 55a6b77c7df0 ip 000055a64601ebc2 sp 00007ffcd88c5150 error 4 in perl[55a645fc0000+185000] likely on CPU 3 (core 1, socket 0)
[11591.432927] Code: 0f 1f 84 00 00 00 00 00 48 8b 4f 10 48 85 c9 74 5f 48 83 ec 08 48 8b 87 30 01 00 00 48 8b 50 10 48 39 d1 75 4c 48 85 f6 74 55 <48> 8b 04 f1 48 85 c0 74 20 48 8d 97 50 01 00 00 48 39 d0 74 14 8b
[12723.693630] EXT4-fs error (device sda3): htree_dirblock_to_tree:1080: inode #28706704: comm find: Directory block failed checksum
[12723.693673] Aborting journal on device sda3-8.
[12723.696920] EXT4-fs error (device sda3): ext4_journal_check_start:83: comm systemd-journal: Detected aborted journal
[12723.696945] EXT4-fs error (device sda3): ext4_journal_check_start:83: comm rs:main Q:Reg: Detected aborted journal
[12723.708257] EXT4-fs (sda3): Remounting filesystem read-only

Rebooting and running : fsck -Cy /dev/sda3 MIGHT have fixed the rootfs.

intel nuc d54250wyk (haswell) ~10 years later

This little NUC I bought ages ago is still chugging along, in continual use (albeit only as a backup ‘server’ with a large 4TiB ssd in it).

It’s recently had ‘open heart’ surgery to replace a failing fan and to clean the dust out of it (for the first time in 10 years).

Wow, it’s quiet now.

In other news, I’m tempted to buy a new desktop mini-pc to replace the ASUS PN50 I have (which seems to struggle a little, perhaps due to me having 3 monitors and it having a relatively weak graphics card).

So I’m now torn between waiting a bit longer, getting a NUC 13 Pro or ASUS PN53 or hoping BeeLink/someone release something. I’m skeptical any of the cheaper Chinese manufacturers will produce anything that’ll last >10 years though).

PHP psalm annotations

This is more of a note for myself, as I keep forgetting the syntax. See also https://github.com/vimeo/psalm and

List of Arrays

Given an 2d array like :

[
   [ 'name' => 'Pickle', 'age' => 4 ],
   [ 'name' => 'David', 'age' => 42 ],
   [ 'name' => 'Sooty', 'age' => 13 ],
]

PHP code that expects a data structure in that same shape can use an annotation like :

/**
 * @psalm-param list<array{name: string, age: int}>
 */

Alternatively that can be used on a @psalm-return ….

Fixed List of Possible Things

If you have a fixed list of stringy things being returned you might have :

/**
 * @psalm-return 'Foo'|'Bar'|'Baz'
 */

systemd-resolve (DNS is always to blame)

For the record, this is using systemd v247, from Debian’s buster-backports.

I think I was enticed by the cool aid, hoping to be able to have DNSSEC or DNSoverTLS …. and caching … and to be fair, it appeared to work on all the servers I’d installed it on (although they were just ‘boring’ LAMP style webservers).

Anyway, everything seemed to be going well, with the default /etc/resolv.conf like :

nameserver 127.0.0.53

options edns0

and /etc/systemd/resolved.conf looking like :

[Resolve]
DNS=8.8.8.8#dns.google 8.8.4.4#dns.google 1.1.1.1
FallbackDNS=1.1.1.1 8.8.4.4 9.9.9.9
LLMNR=no
DNSOverTLS=opportunistic
DNSSEC=no
Cache=yes

Unfortunately, on one relatively busy server which makes multiple HTTP requests out every second, I saw sporadic failures where curl would report a timeout for e.g. graph.facebook.com (>10 connect time).

The timeouts seemed to be grouped together (no timeouts for a number of hours, and then a load of requests would fail) and obviously to be annoying this only happened in production and wasn’t something I could reproduce.

As best I can tell, a failure to lookup was being cached, so all requests for a specific hostname would then fail until the cache expired (30 seconds?)

So I end up having /etc/resolv.conf looking a bit more like a traditional one with 8.8.8.8 as the first nameserver and some custom options to lower the retry time and hopefully trigger multiple DNS lookup attempts.

So, perhaps …. perhaps … systemd-resolve isn’t quite ready for production yet?

faster rsync (ssh cipher choice)

Perhaps the bottleneck isn’t always bandwidth – but does changing ssh cipher make any difference?

Using a derivative of :

rsync -W --delete --no-owner --no-group --no-perms \
    -e ssh \
    -arv /source/ remote@destination:/destination/path/

In unscientific tests, it looks like ssh parameters might do something when copying a 4GiB file between two random virtual machines in different data centres, but both in London.

SSH Variant Speed
-e “ssh” ~45MB/s
-e “ssh -x -T” ~44MB/s
-e “ssh -x -T -c chacha20-poly1305@openssh.com” ~42MB/s
-e “ssh -x -T -c aes128-ctr” ~47MB/s
-e “ssh -x -T -c aes256-gcm@openssh.com” ~50MB/s
-e “ssh -x -T -c aes128-gcm@openssh.com “ ~45MB/s

I’m not sure if these results are particularly insightful / useful.

Hello (again) world

I don’t blog very often.

I should probably stop bothering with the automated twitter compilations.

In other news, some legacy PHP code I look after had this :

<?php
// ...
require('#something.inc.php');

Yes, that’s including a file name that starts with a ‘#’ … which while it’s a clever idea, it’s also a pain in the bum to edit in Vim etc all the time.