30 April 2026

📌 Late Thursday in Gourock. Heating still off for the season, the desk lamp on the warm setting, Blut Aus Nord's 777 - Sect(s) in the headphones because the album's drone matches the mood of reading a kernel changelog at midnight. The kind of evening I was going to spend reading a novel and instead spent reading apt-cache policy linux-image-amd64.

The cycle had been going since yesterday. copy.fail dropped at 09:00 BST on the 29th, hit Hacker News by lunch, and by the time I sat down at the desk the writeup was everywhere. CVE-2026-31431. A 732-byte Python script, root on every Linux distribution shipped since 2017. The kind of thing you read once, then read again, then check the date on the page because it can't be that this just exists.

It does. I tried it on the laptop. It worked. Damn.

the bug

Copy Fail is a logic flaw in algif_aead, the AEAD socket interface of the kernel's userspace crypto API. An in-place optimisation introduced in 2017 reused the destination scatterlist for source data. Most AEAD algorithms confine their writes to the legitimate output area, but authencesn - the IPsec ESN wrapper - uses four bytes past the AEAD tag as scratch space and never restores them. Chained through splice() over an AF_ALG socket, that scratch write becomes a controlled four-byte write into the page cache of any readable file on the system. Pick /usr/bin/su, modify the right four bytes, run su, get root.

No race. No kernel offset. No version checks in the exploit. The same Python script gets root on Ubuntu 24.04, RHEL 10, SUSE 16, Amazon Linux 2023, and a Debian trixie box in a Linode rack somewhere. The bug has been there for nearly nine years.

What surfaced it was an AI-driven scanner from Theori called Xint Code, which found it after about an hour of scanning the kernel crypto/ subsystem. The Bugcrowd note on the disclosure puts the implication plainly: a universal Linux LPE with no race and no offsets is the kind of primitive that, on the gray market, used to sell in the seven-figure range. The fact that an AI surfaced one in an hour is, separately from the bug itself, the news.

the vps

The VPS that runs this blog is a small Debian trixie box. Single-tenant, only I have shell, but it's exposed to the internet through nginx, and any web RCE in the userspace it runs would chain through Copy Fail and become root on the host. The advisory classifies this exact scenario as Medium. I disagree, only because Medium is the kind of word that lets a system end up on next month's patch cycle. There's a public PoC, the writeup is at the top of every aggregator, and the kernel I'm running was built last November.

The first thing I checked was the obvious one:

# lsmod | grep algif
# 

Empty. For about ten seconds I thought the box might be naturally protected by not having loaded the module. It isn't. Linux loads kernel modules on demand: the moment a process opens an AF_ALG socket and binds an AEAD algorithm, the kernel autoloads algif_aead. The empty lsmod only meant nobody had triggered it yet. One socket() + bind() later:

# python3 -c "import socket; s=socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0); s.bind(('aead', 'authencesn(hmac(sha256),cbc(aes))')); print('AF_ALG reachable')"
AF_ALG reachable
# lsmod | grep algif
algif_aead             12288  0
crypto_null            16384  3 authencesn,authenc,algif_aead
af_alg                 36864  1 algif_aead

The exploit's first step would have done the same thing. Module not loaded is not the same as module not reachable. It's a distinction that matters when reading any inventory check that grep's lsmod.

The mitigation is small and zero-downtime. Blacklist the module via modprobe.d:

# echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif.conf
# rmmod algif_aead 2>/dev/null

The first line tells modprobe to run /bin/false whenever something tries to autoload algif_aead. The second unloads it if it's currently loaded. Both are idempotent and persist across reboots.

Verification by the same socket bind, after the blacklist:

# python3 -c "import socket; s=socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0); s.bind(('aead', 'authencesn(hmac(sha256),cbc(aes))'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory

That FileNotFoundError is what protection looks like at this layer. The kernel asked modprobe to load the module, modprobe ran /bin/false per the install handler, the call failed with exit 1, the userspace bind() got ENOENT. The first step of the exploit doesn't complete. The four other steps are unreachable.

A list of what doesn't pass through algif_aead, and therefore is unaffected by this workaround: dm-crypt and LUKS, kTLS, IPsec/XFRM, OpenSSL/GnuTLS/NSS in their default builds, SSH, the kernel keyring crypto. The only userspace that uses algif_aead directly is software explicitly compiled with OpenSSL's afalg engine or some embedded crypto offload paths. Nothing on this VPS does that. ss -xa | grep -i alg returned empty. The blacklist breaks nothing.

the kernel that wasn't

Then I went looking for the actual fix. Mainline commit a664bf3d603d, committed on 1 April. The simple question was whether Debian Security had backported it.

# uname -r
6.12.57+deb13-amd64
# apt-cache policy linux-image-amd64
linux-image-amd64:
  Installed: 6.12.74-2
  Candidate: 6.12.74-2
  Version table:
 *** 6.12.74-2 500
        500 http://mirrors.linode.com/debian-security trixie-security/updates/main amd64 Packages
        100 /var/lib/dpkg/status
     6.12.73-1 500
        500 http://mirrors.linode.com/debian trixie/main amd64 Packages

A pattern that took me a moment to read. The linux-image-amd64 meta-package is at 6.12.74-2 from trixie-security. The uname -r says we're running 6.12.57. The newer kernel is installed in /boot but I haven't rebooted since unattended-upgrades pulled it down. The standard Debian dance: kernels accumulate in /boot, you reboot when you reboot.

Two kernels in /boot:

# ls -la /boot/vmlinuz-*
-rw-r--r-- 1 root root 12101568 Nov  5 14:56 /boot/vmlinuz-6.12.57+deb13-amd64
-rw-r--r-- 1 root root 12113856 Mar  8 19:54 /boot/vmlinuz-6.12.74+deb13+1-amd64

The newer one is dated 8 March 2026. The Copy Fail fix was committed on 1 April 2026. Already, before opening the changelog, this can't include the fix - the dates make it impossible. But Debian Security does sometimes pull patches from mainline trees ahead of the stable cut, so I checked the changelog anyway:

# apt changelog linux-image-amd64 2>/dev/null | grep -iE "31431|copy.?fail|algif|authencesn"
    - crypto: authencesn - reject too-short AAD (assoclen<8) to match ESP/ESN
    - crypto: algif_hash - fix double free in hash_accept
    - device_cgroup: Roll back to original exceptions after copy failure
    - crypto: algif_aead - Do not set MAY_BACKLOG on the async path
    - crypto: algif_skcipher - EBUSY on aio should be an error
    - crypto: algif_aead - Only wake up when ctx->more is zero
    - crypto: algif_aead - fix uninitialized ctx->init
    - crypto: algif_skcipher - Use chunksize instead of blocksize

Eight matches, none of them the fix. The first one nearly looked like it for half a second - "reject too-short AAD" on authencesn is genuinely on the same code path - but Copy Fail uses an AAD of exactly 8 bytes, which is valid by ESP spec, so the rejection check would let the exploit through unchanged. The other seven are unrelated to the in-place optimisation that the upstream fix reverts. They're a pile of small concurrency and initialisation patches that Debian Security ships in batches.

So the kernel sitting in /boot since 8 March is not the patched kernel. Rebooting now would change uname -r from 6.12.57 to 6.12.74 and accomplish nothing for Copy Fail. Both kernels are vulnerable. The patched Debian kernel will arrive in a separate trixie-security push, probably within the week, and is the one I want.

closed but not patched

Which leaves the VPS in a state I'd describe as closed but not patched. The kernel is still vulnerable. The vector to reach the kernel is blocked. The blacklist is the load-bearing piece, not the kernel build. This is the kind of arrangement that shows up whenever the upstream fix lags the public PoC: the workaround does the actual protecting, and the patch arrives later to make the workaround redundant rather than essential.

The plan is to leave the box in this state. When the patched kernel ships, apt full-upgrade and reboot, and at that point the blacklist becomes belt-and-braces rather than the only thing standing between the box and the public PoC. I'll keep the file in modprobe.d regardless. There is no userspace on this host that uses algif_aead, so the blacklist costs nothing, and reducing the kernel's attack surface by one rarely-used module is a free win.

A small cron to notice when the right update lands, so I don't have to manually apt-cache changelog every morning:

# cat > /usr/local/bin/check-copyfail-fix.sh <<'EOF'
#!/bin/bash
apt-get update -qq 2>/dev/null
if apt-cache changelog linux-image-amd64 2>/dev/null | \
   grep -qiE "31431|copy.?fail|a664bf3d603d"; then
    /usr/local/bin/notify "Debian kernel patched for CVE-2026-31431"
fi
EOF
# chmod +x /usr/local/bin/check-copyfail-fix.sh
# echo "0 9 * * * root /usr/local/bin/check-copyfail-fix.sh" \
    > /etc/cron.d/check-copyfail-fix

/usr/local/bin/notify is whatever I want it to be on a given day - sendmail, a curl to a self-hosted ntfy topic, a webhook into the homelab. The point is to not check this manually for the next two weeks.

and then...

The thing that stays with me, after the box is closed and the changelog has been read, is the timeline. Theori reported the bug to the Linux kernel security team on 23 March. Patches were proposed on the 25th. The fix was committed on 1 April. The CVE was assigned on the 22nd. Public disclosure on the 29th. From discovery to public PoC: thirty-seven days. From kernel commit to public PoC: twenty-eight days. The Linux distributions had four weeks to backport, build, sign, and push the kernel image. Most shipped on or near the disclosure date; Debian, on the box I'm sitting at, is still working on it. Coordinated disclosure works because the distros can keep up, and four weeks is - by historical standards - generous.

The interesting question is what happens when the discovery side speeds up. Xint Code found Copy Fail in about an hour of scanning the crypto/ subsystem, and the same scan surfaced other high-severity bugs that haven't been published yet. If the supply rate of universal LPEs goes up by an order of magnitude, the four-week window doesn't, and the assumption that holds the whole coordinated-disclosure choreography together starts looking different.

I don't have a useful prediction to make about what comes next. I have a workaround in modprobe.d, a cron job that checks for the patched kernel once a day, and a slightly higher level of background paranoia about the rest of the stack. The homelab is on the list for the weekend. Work is on the list for tomorrow. The pattern that shows up in these moments is the one I keep coming back to: the work that protects the box isn't the patch, it's the engineering that lets you not need the patch right this minute. A blacklist file. A cron job. A kernel module nobody really used anyway.

It's gone past midnight. The album finished half an hour ago and I haven't put another one on. Time to stop reading kernel commit logs and go to bed.