sysadmin – David Goodwin

May 6, 2025May 6, 2025

Trying out headscale (tailscale vpn stuff)

For sometime, I’ve been using Wireguard for a VPN to use when I’m out and about etc.

As I’m fairly stupid, I used wg-quick to generate the config – however when the config looks a bit like this –

[Peer] PublicKey = cm+t2u0giNynMkcX1+afPu6SlKyLMeTe8iWKhT1FsDk= AllowedIPs = 10.0.0.13/32 Endpoint = 192.168.122.13:51820 ....

I began to find management became a problem – i.e which computer is that exactly ?

wg show does give you something a bit like this –

... peer: cm+t2u0giNynMkcX1+afPu6SlKyLMeTe8iWKhT1FsDk= endpoint: 192.168.122.13:51820 allowed ips: 10.0.0.13/32 ...

which is sort of useful, but it still doesn’t tell me a human name. I’ve tried leaving comments in the config before, but they just get wiped out.

I’ve often thought about using TailScale, but wasn’t overly happy with the idea of some third party being involved. Eventually I came across headscale – which offers a self-hosted option for the backend (so your devices use the tailscale frontend).

After a bit of poking around over the weekend I now have this: headscale nodes list

which is a bit nicer –

I’m still pretty new to using Tailscale for a VPN, but I did at least eventually get my phone to join the network, and everything seems to work.

It’s sort of interesting that tailscale doesn’t add an entry into your routing table – but instead adds a few iptables rules in (nat) to mess around with things.

April 24, 2025

It’s always DNS …. (unbound / domain signing stuff)

Yesterday, I spent most of my day wondering what was wrong with my unbound configuration…. as a TL;DR, if you’re creating a new TLD (e.g. foo.lan) you may need to disable DNSSEC checks on it within unbound’s config using the ‘domain-insecure‘ setting –

Continue reading “It’s always DNS …. (unbound / domain signing stuff)”

April 12, 2025

Initial foray into Terraform / OpenTofu

So over the last couple of weeks at work, I’ve been learning to use Terraform (well OpenTofu) to help us manage multiple deployments in Azure and AWS.

The thought being that we can have a single ‘plan’ of what a deployment should look like, and deviations will be spotted / can be alerted on.

I was tempted to try and write a contrived article showing how you could create a VM in AWS (or Azure) using Terraform, but I’m not sure I’ve got anything to add over the 101 other articles on the internet.

Vaguely useful things :

The tofu configuration is much quicker to write than e.g. trying to talk to AWS using it’s SDK (something I did do about 7-8 years ago)
You can split the config up into multiple .tf files within your working directory, the tool just merges them all together at run time
Having auto-complete in an editor is pretty much necessary (in my case, PHPStorm)
tofu is quite quick to run – it doesn’t take all that long to check the state of the known resources and the config files, which is good; unfortunately Azure often takes sometime to do something on its end…
I’ve yet to see any point in writing a module to try and encapsulate any of our configuration as I can’t see any need to re-use bits anywhere

I’m not sure how we’re going to go about reconciling our legacy (production) environment with a newer / shiny one built with tofu though.

May 8, 2024May 8, 2024

LetsEncrypt + Azure Keyvault + Application gateway

A few years ago I setup an Azure Function App to retrieve a LetsEncrypt certificate for a few $work services.

Annoyingly that silently stopped renewing stuff.

Given I’ve no idea how to update it etc or really investigate it (it’s too much of a black box) I decided to replace it with certbot etc, hopefully run through a scheduled github action.

To keep things interesting, I need to use the Route53 DNS stuff to verify domain ownership.

Random bits :

docker run --rm -e AWS_ACCESS_KEY_ID="$AWS_ACCESS_KEY_ID" \
-e AWS_SECRET_ACCESS_KEY="$AWS_SECRET_ACCESS_KEY" \
-v $(pwd)/certs:/etc/letsencrypt/ \
-u $(id -u ${USER}):$(id -g ${USER}) \
certbot/dns-route53 certonly \
--agree-tos \
--email=me@example.com \
--server https://acme-v02.api.letsencrypt.org/directory \
--dns-route53 \
--rsa-key-size 2048 \
--key-type rsa \
--keep-until-expiring \
--preferred-challenges dns \
--non-interactive \
--work-dir /etc/letsencrypt/work \
--logs-dir /etc/letsencrypt/logs \
--config-dir /etc/letsencrypt/ \
-d mydomain.example.com

Azure needs the rsa-key-size 2048 and type to be specified. I tried 4096 and it told me to f.off.

Once that’s done, the following seems to produce a certificate that keyvault will accept, and the load balancer can use, that includes an intermediate certificate / some sort of chain.

cat certs/live/mydomain.example.com/{fullchain.pem,privkey.pem} > certs/mydomain.pem 

openssl pkcs12 -in certs/mydomain.pem -keypbe NONE -cetpbe NONE -nomaciter -passout pass:something -out certs/something.pfx -export

az keyvault certificate import --vault-name my-azure-vault  -n certificate-name -f certs/something.pfx --password something

Thankfully that seems to get accepted by Azure, and when it’s applied to an application gateway listener, clients see an appropriate chain.

December 30, 2023December 30, 2023

Upgrade some things

Well, I sort of realised I had a web server or two that were still on Debian Buster, and it was time to move to Bullseye or Bookworm. As usual the Debian upgrade procedure was mostly pretty straight forward and uneventful.

Interesting findings :

“hitch“, which I use as an SSL frontend to varnish, doesn’t seem to get along all that well with systemd and silently fails if your config has “daemon = on” setting in /etc/hitch/hitch.conf. Annoyingly when trying to test the configuration with “hitch -t” you will get an error like: “No x509 certificate PEM file specified for frontend ‘default’!” – the solution to that is to specify the config file – i.e : hitch -t --config /etc/hitch/hitch.conf

hitch hasn’t had a release in it’s packagecloud.io repository for the last 3 years; so the debian supported variant looks more appealing.

In other news, I noticed this post where someone moaned about systemd-resolved the other day – https://www.reddit.com/r/linux/comments/18kh1r5/im_shocked_that_almost_no_one_is_talking_about/ – I’ve had similar problems to the people on the thread (resolved stops working etc) so thought it was time to try and use ‘unbound‘ instead.

apt-get install unbound

and then tell /etc/resolv.conf to use 127.0.0.1 for DNS.

annoyingly, unbound-control stats isn’t quite as pretty as resolvectl statistics but oh well.

echo -e “nameserver 127.0.0.1\nnameserver 8.8.8.8\noptions timeout:4” >/etc/resolv.conf

and an /etc/unbound/unbound.conf file that looks perhaps like :

server: interface: 127.0.0.1 access-control: 127.0.0.0/8 allow access-control: ::1/128 allow # The following line will configure unbound to perform cryptographic # DNSSEC validation using the root trust anchor. auto-trust-anchor-file: "/var/lib/unbound/root.key" tls-cert-bundle: "/etc/ssl/certs/ca-certificates.crt"


remote-control:
    control-enable: yes
    # by default the control interface is is 127.0.0.1 and ::1 and port 8953
    # it is possible to use a unix socket too
    control-interface: /run/unbound.ctl

forward-zone: name: "." forward-tls-upstream: yes forward-addr: 1.1.1.1@853#cloudflare-dns.com forward-addr: 1.0.0.1@853#cloudflare-dns.com

(Unfortunately my ISP is shitty, and doesn’t yet give me an ipv6 address).

Looking at https://1.1.1.1/help – I do sometimes see that ‘DNS over TLS’ is “yes”…. so I guess something is right; annoyingly I don’t see anything useful from unbound’s stats (unbound-control stats) to show it’s done a secure query…

“unbound-host” (another debian package) – will helpfully tell you whether a lookup was done ‘securely’ or not – e.g.

$ unbound-host google.com -D -v
google.com has address 142.250.178.14 (insecure)
google.com has IPv6 address 2a00:1450:4009:815::200e (insecure)
google.com mail is handled by 10 smtp.google.com. (insecure)

which seems a little odd to me (I’d have thought google would support dns sec), but some domains do work – e.g.

$ unbound-host mythic-beasts.com -D -v mythic-beasts.com has address 93.93.130.166 (secure) mythic-beasts.com has IPv6 address 2a00:1098:0:82:1000:0:1:2 (secure) mythic-beasts.com mail is handled by 10 mx1.mythic-beasts.com. (secure) mythic-beasts.com mail is handled by 10 mx2.mythic-beasts.com. (secure)

September 12, 2023September 12, 2023

bash – escaping variables for use within commands

Escaping quotes within variables is always painful in bash (somehow) – e.g.

foo”bar

and it’s not obvious that you’d need to write e.g.

“foo”\””bar”

(at least to me).

Thankfully a bash built in magical thing can be used to do the escaping for you.

In my case, I need to pass a ‘PASSWORD’ variable through to run within a container. The PASSWORD variable needs escaping so it can safely contain things like ; or quote marks (” or ‘).

e.g. docker compose run app /bin/bash "echo $PASSWORD > /some/file"

or e.g. ssh user@server “echo $PASSWORD > /tmp/something”

The fix is to use the ${PASSWORD@Q} variable syntax – for example:

#!/bin/bash

FOO=”bar’\”baz”

ssh user@server “echo $FOO > /tmp/something”

This will fail, with something like : “bash: -c: line 1: unexpected EOF while looking for matching `''“

As she shell at the remote end it seeing echo bar'"baz and expects the quote mark to be closed.

So using the @Q magic –

ssh user@server “echo ${FOO@Q} > /tmp/something”

which will result in /tmp/something containing “bar'”baz” which is correct.

August 29, 2023

Resizing a VM’s disk within Azure

Random notes on resizing a disk attached to an Azure VM …

Check what you have already –

az disk list --resource-group MyResourceGroup --query '[*].{Name:name,Gb:diskSizeGb,Tier:accountType}' --output table

might output something a bit like :

Name Gb
———————————————- —-
foo-os 30
bar-os 30
foo-data 512
bar-data 256

So here, we can see the ‘bar-data’ disk is only 256Gb.

Assuming you want to change it to be 512Gb (Azure doesn’t support an arbitary size, you need to choose a supported size…)

az disk update --resource-group MyResourceGroup --name bar-data --size-gb 512

Then wait a bit …

In my case, the VMs are running Debian Buster, and I see this within the ‘dmesg‘ output after the resize has completed (on the server itself).

[31197927.047562] sd 1:0:0:0: [storvsc] Sense Key : Unit Attention [current] [31197927.053777] sd 1:0:0:0: [storvsc] Add. Sense: Capacity data has changed [31197927.058993] sd 1:0:0:0: Capacity data has changed

Unfortunately the new size doesn’t show up straight away to the O/S, so I think you either need to reboot the VM or (what I do) –

echo 1 > /sys/class/block/sda/device/rescan

at which point the newer size appears within your ‘lsblk‘ output – and the filesystem can be resized using e.g. resize2fs

March 26, 2023

systemd-resolve (DNS is always to blame)

For the record, this is using systemd v247, from Debian’s buster-backports.

I think I was enticed by the cool aid, hoping to be able to have DNSSEC or DNSoverTLS …. and caching … and to be fair, it appeared to work on all the servers I’d installed it on (although they were just ‘boring’ LAMP style webservers).

Anyway, everything seemed to be going well, with the default /etc/resolv.conf like :

nameserver 127.0.0.53

options edns0

and /etc/systemd/resolved.conf looking like :

[Resolve]
DNS=8.8.8.8#dns.google 8.8.4.4#dns.google 1.1.1.1
FallbackDNS=1.1.1.1 8.8.4.4 9.9.9.9
LLMNR=no
DNSOverTLS=opportunistic
DNSSEC=no
Cache=yes

Unfortunately, on one relatively busy server which makes multiple HTTP requests out every second, I saw sporadic failures where curl would report a timeout for e.g. graph.facebook.com (>10 connect time).

The timeouts seemed to be grouped together (no timeouts for a number of hours, and then a load of requests would fail) and obviously to be annoying this only happened in production and wasn’t something I could reproduce.

As best I can tell, a failure to lookup was being cached, so all requests for a specific hostname would then fail until the cache expired (30 seconds?)

So I end up having /etc/resolv.conf looking a bit more like a traditional one with 8.8.8.8 as the first nameserver and some custom options to lower the retry time and hopefully trigger multiple DNS lookup attempts.

So, perhaps …. perhaps … systemd-resolve isn’t quite ready for production yet?

November 30, 2022April 15, 2023

faster rsync (ssh cipher choice)

Perhaps the bottleneck isn’t always bandwidth – but does changing ssh cipher make any difference?

Using a derivative of :

rsync -W --delete --no-owner --no-group --no-perms \
    -e ssh \
    -arv /source/ remote@destination:/destination/path/

In unscientific tests, it looks like ssh parameters might do something when copying a 4GiB file between two random virtual machines in different data centres, but both in London.

SSH Variant	Speed
-e “ssh”	~45MB/s
-e “ssh -x -T”	~44MB/s
-e “ssh -x -T -c chacha20-poly1305@openssh.com”	~42MB/s
-e “ssh -x -T -c aes128-ctr”	~47MB/s
-e “ssh -x -T -c aes256-gcm@openssh.com”	~50MB/s
-e “ssh -x -T -c aes128-gcm@openssh.com “	~45MB/s

I’m not sure if these results are particularly insightful / useful.

October 10, 2021October 10, 2021

(re)building varnish modules

I’m using Varrsh 6 LTS in some places, and need a way to rebuild dependent modules …. which seem to need recompiling even for a minor feature release (E.g. 6.0.1 to 6.0.2).

I use dynamic (DNS routing), var and vsthrottle.

Firstly, here’s a Dockerfile –

FROM debian:buster as builder

ARG VARNISH_VERSION=6.0.8-1~buster

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get -qy update && \
    apt-get -qy install eatmydata apt-transport-https lsb-release ca-certificates curl gnupg wget && \
    apt-get clean

RUN echo "\
Package: varnish\n\
Pin: version ${VARNISH_VERSION}\n\
Pin-Priority: 1001 \
\
Package: varnish-dev \n\
Pin: version ${VARNISH_VERSION} \n\
Pin-Priority: 1001 \
" >> /etc/apt/preferences.d/varnish 

RUN echo "deb https://packagecloud.io/varnishcache/varnish60lts/debian/ buster main" > /etc/apt/sources.list.d/varnish.list

RUN wget -qO /tmp/varnish.gpg https://packagecloud.io/varnishcache/varnish60lts/gpgkey && \
    apt-key add /tmp/varnish.gpg && \
    apt-get -q update && \
    eatmydata -- apt-get -qy install varnish varnish-dev automake libtool make libncurses-dev pkg-config python3-docutils unzip libgetdns10 libgetdns-dev

RUN apt-cache policy varnish

WORKDIR /tmp

RUN wget -qO /tmp/varnish.zip https://github.com/varnish/varnish-modules/archive/refs/heads/6.0.zip && \
    unzip /tmp/varnish.zip && \
    cd varnish-modules-6.0 && \
    bash bootstrap && \
    ./configure --disable-dependency-tracking && \
    make && \
    make check && \
    make install 

RUN wget -qO /tmp/dynamic.zip https://github.com/nigoroll/libvmod-dynamic/archive/refs/heads/6.0.zip && \
    unzip /tmp/dynamic.zip && \
    cd libvmod-dynamic-6.0 && \
    bash autogen.sh && \
    bash configure && \
    make && \
    make install


FROM debian:buster
    
WORKDIR /srv/export
COPY --from=builder /usr/lib/varnish/vmods/libvmod_dynamic.so /srv/export/
COPY --from=builder /usr/lib/varnish/vmods/libvmod_proxy.so /srv/export/
COPY --from=builder /usr/lib/varnish/vmods/libvmod_var.so /srv/export/
COPY --from=builder /usr/lib/varnish/vmods/libvmod_vsthrottle.so /srv/export/
COPY --from=builder /usr/lib/varnish/vmods/libvmod_header.so /srv/export/

and then, I copy the files out of that build pipeline (dare i call it that?) with this shell script

#!/bin/bash

set -eux

# Build a new set of varnish modules.

# Each version of varnish needs it's own build of some modules - moving from e.g. varnish 6.0.7~1-stretch to 6.0.8~1-stretch 
# isn't possible without these modules being rebuilt.

[ -d $(pwd)/tmp ] && rm -Rf $(pwd)/tmp

docker build --pull -f Dockerfile -t builder .

mkdir tmp

docker run -v $(pwd)/tmp:/srv/tmp -ti builder bash -c 'cp /srv/export/* /srv/tmp'

Then it’s just a case of running ‘build.sh’ and waiting …. and you’ll find the files you want in ‘tmp’.