Upgrade some things

Well, I sort of realised I had a web server or two that were still on Debian Buster, and it was time to move to Bullseye or Bookworm. As usual the Debian upgrade procedure was mostly pretty straight forward and uneventful.

Interesting findings :

  • hitch“, which I use as an SSL frontend to varnish, doesn’t seem to get along all that well with systemd and silently fails if your config has “daemon = on” setting in /etc/hitch/hitch.conf. Annoyingly when trying to test the configuration with “hitch -t” you will get an error like: “No x509 certificate PEM file specified for frontend ‘default’!” – the solution to that is to specify the config file – i.e : hitch -t --config /etc/hitch/hitch.conf
  • hitch hasn’t had a release in it’s packagecloud.io repository for the last 3 years; so the debian supported variant looks more appealing.

In other news, I noticed this post where someone moaned about systemd-resolved the other day – https://www.reddit.com/r/linux/comments/18kh1r5/im_shocked_that_almost_no_one_is_talking_about/ – I’ve had similar problems to the people on the thread (resolved stops working etc) so thought it was time to try and use ‘unbound‘ instead.

apt-get install unbound

and then tell /etc/resolv.conf to use 127.0.0.1 for DNS.

annoyingly, unbound-control stats isn’t quite as pretty as resolvectl statistics but oh well.

echo -e “nameserver 127.0.0.1\nnameserver 8.8.8.8\noptions timeout:4” >/etc/resolv.conf

and an /etc/unbound/unbound.conf file that looks perhaps like :

server:
interface: 127.0.0.1
access-control: 127.0.0.0/8 allow
access-control: ::1/128 allow
# The following line will configure unbound to perform cryptographic
# DNSSEC validation using the root trust anchor.
auto-trust-anchor-file: "/var/lib/unbound/root.key"
tls-cert-bundle: "/etc/ssl/certs/ca-certificates.crt"

remote-control:
control-enable: yes
# by default the control interface is is 127.0.0.1 and ::1 and port 8953
# it is possible to use a unix socket too
control-interface: /run/unbound.ctl

forward-zone:
name: "."
forward-tls-upstream: yes
forward-addr: 1.1.1.1@853#cloudflare-dns.com
forward-addr: 1.0.0.1@853#cloudflare-dns.com

(Unfortunately my ISP is shitty, and doesn’t yet give me an ipv6 address).

Looking at https://1.1.1.1/help – I do sometimes see that ‘DNS over TLS’ is “yes”…. so I guess something is right; annoyingly I don’t see anything useful from unbound’s stats (unbound-control stats) to show it’s done a secure query…

“unbound-host” (another debian package) – will helpfully tell you whether a lookup was done ‘securely’ or not – e.g.

$ unbound-host google.com -D -v
google.com has address 142.250.178.14 (insecure)
google.com has IPv6 address 2a00:1450:4009:815::200e (insecure)
google.com mail is handled by 10 smtp.google.com. (insecure)

which seems a little odd to me (I’d have thought google would support dns sec), but some domains do work – e.g.

$ unbound-host mythic-beasts.com -D -v
mythic-beasts.com has address 93.93.130.166 (secure)
mythic-beasts.com has IPv6 address 2a00:1098:0:82:1000:0:1:2 (secure)
mythic-beasts.com mail is handled by 10 mx1.mythic-beasts.com. (secure)
mythic-beasts.com mail is handled by 10 mx2.mythic-beasts.com. (secure)

systemd-resolve (DNS is always to blame)

For the record, this is using systemd v247, from Debian’s buster-backports.

I think I was enticed by the cool aid, hoping to be able to have DNSSEC or DNSoverTLS …. and caching … and to be fair, it appeared to work on all the servers I’d installed it on (although they were just ‘boring’ LAMP style webservers).

Anyway, everything seemed to be going well, with the default /etc/resolv.conf like :

nameserver 127.0.0.53

options edns0

and /etc/systemd/resolved.conf looking like :

[Resolve]
DNS=8.8.8.8#dns.google 8.8.4.4#dns.google 1.1.1.1
FallbackDNS=1.1.1.1 8.8.4.4 9.9.9.9
LLMNR=no
DNSOverTLS=opportunistic
DNSSEC=no
Cache=yes

Unfortunately, on one relatively busy server which makes multiple HTTP requests out every second, I saw sporadic failures where curl would report a timeout for e.g. graph.facebook.com (>10 connect time).

The timeouts seemed to be grouped together (no timeouts for a number of hours, and then a load of requests would fail) and obviously to be annoying this only happened in production and wasn’t something I could reproduce.

As best I can tell, a failure to lookup was being cached, so all requests for a specific hostname would then fail until the cache expired (30 seconds?)

So I end up having /etc/resolv.conf looking a bit more like a traditional one with 8.8.8.8 as the first nameserver and some custom options to lower the retry time and hopefully trigger multiple DNS lookup attempts.

So, perhaps …. perhaps … systemd-resolve isn’t quite ready for production yet?