Categories
Computers

The curious case of binfmt for x86 emulation for ARM Docker

Seemingly identical configurations, different results. When two methods for setting up x86 emulation on ARM showed the exact same system configuration but behaved completely differently in Docker, I began questioning my system administration knowledge and my sanity – and briefly contemplated a new career as a blacksmith.

This is a debugging tale for those working with containers, and a reminder that things aren’t always what they seem in Linux, all with a big pinch reminder to Read the Fine Manual, Always!

ARM with Achiveteam v2

Recently I’ve got an email from a reader of the ARM images to help the Archive Team blogpost from years ago, asking me about refreshing that project to use again. There I was recompiling the ArchiveTeam’s Docker images to support ARM, and thus I was looking how things changed in the intervening time. I also got more lazy pragmatic since then, I was was wondering if the Archiveteam just made some ARM or multi-arch images as I believe(d) they should. That lead me to their FAQ entry about ARM images:

Can I run the Warrior on ARM or some other unusual architecture?

Not directly. We currently do not allow ARM (used on Raspberry Pi and M1 Macs) or other non-x86 architectures. This is because we have previously discovered questionable practices in the Wget archive-creating components and are not confident that they run correctly under (among other things) different endiannesses. […]

Set up QEMU with your Docker install and add –platform linux/amd64 to your docker run command.

This actually seems like a sensible thing – if they dug that deep that they’ve seen issues in wget, I’ve definitely been doing things naively before.

The guidance of installing QEMU seems sensible as well (we were doing a lot of those at balena), and it goes roughly like.

  1. install binfmt
  2. install QEMU with statically compiled binaries
  3. load those binaries to emulate the platforms you want with the F / fix_binary flag

For those unfamiliar, binfmt_misc is a Linux kernel feature that allows non-native binary formats to be recognized and passed to user space applications. It’s what makes it possible to run ARM binaries on x86 systems and vice versa through emulation. The various flags are how the actual behaviour of binfmt is adjusted (F, P, C, O)

Docker advised to use a image to set things up, that is, for example for the x86_64/amd64 platform like this:

docker run --privileged --rm tonistiigi/binfmt --install amd4

My Raspberry Pi is running ArchLinuxARM which installs systemd-binfmt to load the relevant emulation settings at boot time, which seemed handy: with the docker method I had to run that every time before I could run an emulated container, with systemd I would have thing ready by every time the time Docker is ready to run (ie. keeping the Archiveteam containers always on and restarting after reboot.) So I have a strong incentive to use the systemd-based approach instead of the docker run based one.

Now comes the kicker 🤯:

  • the docker installed binfmt setup worked and allowed to run linux/amd64 containers
  • systemd-binfmt initiated binfmt setup worked for the x86_64 binaries in the file system, but not in Docker where the binaries just failed to run
  • both setups had identical output when looking at the config in /proc/sys/fs/binfmt_misc

When Same’s Not the Same

To see whether emulation works, the tonistiigi/binfmt container can be invoked without any arguments and it shows the status. For example setting things up with docker would show:

$ docker run --privileged --rm tonistiigi/binfmt
{
  "supported": [
    "linux/arm64",
    "linux/amd64",
    "linux/amd64/v2",
    "linux/arm/v7",
    "linux/arm/v6"
  ],
  "emulators": [
    "qemu-x86_64"
  ]
}

Here the supported section shows amd64 as it should, and their test of running an amd64 image to check if the binaries are run has the expected output:

$ docker run --rm --platform linux/amd64 -t alpine uname -m
x86_64

Going back to the alternative, after uninstalling that emulatior I start up systemd-binfmtI can test the status again:

$ docker run --privileged --rm tonistiigi/binfmt
{
  "supported": [
    "linux/arm64",
    "linux/arm/v7",
    "linux/arm/v6"
  ],
  "emulators": [
[...snip...]
    "qemu-x86_64",
[...snip...]
  ]
}

This shows that while the emulator is installed, Docker doesn’t find that the linux/amd64 platform is supported, and this checks out with running the alpine image again as above:

$ docker run --rm --platform linux/amd64 -t alpine uname -m
exec /bin/uname: exec format error

Well, this doesn’t work.

The binfmt_misc docs in the Linux Kernel wiki do have plenty of info on the setup and use of the that emulation function. For example to check the configuration of the emulation setup, we can look at the contents of a file in /proc filesystem:

$ cat /proc/sys/fs/binfmt_misc/qemu-x86_64
enabled
interpreter /usr/bin/qemu-x86_64
flags: POCF
offset 0
magic 7f454c4602010100000000000000000002003e00
mask fffffffffffefe00fffffffffffffffffeffffff

This was the almost the same whether I the docker based setup or used systemd-binfmt with a slight difference: the flags bit is only PF when run with systemd-binfmt, and POCF when set things up with docker run. Even if the Docker docs are asking for the F flag, I wanted to make sure we are on equal footing, so I’ve tried to modify the QEMU setup to be the same. This means overriding the qemu-x86_64.conf that is shipped by default:

  • Copy the config from /usr/lib/binfmt.d/qemu-x86_64.conf to /etc/binfmt.d/qemu-x86_64.conf (make sure the file has the same name to ensure this new file overrides the one from the lib folder)
  • Edit the end of the line from :FP to :FPOC
  • restart systemd-binfmt

After this the output of the the runtime info in /proc/sys/fs/binfmt_misc/qemu-x86_64 was completely the same. Why’s the difference?

More debugging steps ensued:

More Debugging Ensued

I’ve read through the source code of tonistiigi/binfmt on GitHub and seen that it doesn’t do anything fancy, it’s quite clear implementation of the `binfmt_misc` usage docs and the same magic values as QEMU shipped on my system. Good that no surprise, but no hints of any difference

I’ve tried to replicate that process of setting up QEMU through translating it into Python and running, still the same

I’ve recompiled the binary on my system, and run it outside of docker: it worked the same way as the systemd-binfmt setup: x86_64 static binaries1 worked outside of Docker but not inside of it

A sort-of breakthrough came when I’ve tried out dbhi/qus Docker images, that promises “qemu-user-static (qus) and containers, non-invasive minimal working setups”, and can do the similar emulator & platform support setup with:

docker run --rm --privileged aptman/qus -s -- -p x86_64

It was a lot slower to run (coming back to this later), but worked like the charm, just like Docker’s own recommendation. However there was a difference in the outcome when I checked the runtime config info:

$ cat /proc/sys/fs/binfmt_misc/qemu-x86_64
enabled
interpreter /qus/bin/qemu-x86_64-static
flags: F
offset 0
magic 7f454c4602010100000000000000000002003e00
mask fffffffffffefe00fffffffffffffffffeffffff

It has just the apparently required F flag, but the interpreter points to /qus/bin/qemu-x86_64-static … which is not in the regular file system. Nevertheless alpine happily runs, just as my local static binaries.

How does this actually work, then?

Everything’s Illuminated

With this above, and with a better understanding what the docs say, we have everything in place to understand the all the behaviours above, things we had pointers throughout, but not enough experience to put them together:

So, the F flag was required by the Docker docs, what does that actually do?

F – fix binary

The usual behaviour of binfmt_misc is to spawn the binary lazily when the misc format file is invoked. However, this doesn’t work very well in the face of mount namespaces and changeroots, so the F mode opens the binary as soon as the emulation is installed and uses the opened image to spawn the emulator, meaning it is always available once installed, regardless of how the environment changes.

Because of this, if F is set, the interpreter entry in the runtime settings doesn’t mean the path of the interpreter it will be called, but where it was called at the time – ie. it’s irrelevant for the actual runtime.

The tonistiigi/binfmt image ships its own static-compiled qemu-* binarlies, as well as aptman/qus container gets the right ones at runtime (hence the slowness), and the interpreter path is the binary inside the container when the command is run. The binary is then kept in memory, and the container can go away, the interpreter path’s not refering anything that exists any longer.

Why does systemd-binfmt fail then? Well of course because it’s a dynamically linked binary:

$ file /usr/bin/qemu-x86_64
/usr/bin/qemu-x86_64: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=a4b8a93a4361be61dfa34a0eab40083325853839, for GNU/Linux 3.7.0, stripped

… and because it’s dynamically linked, even if the F flag makes it stay in memory, its lib dependencies aren’t, so when in run in Docker (which uses namespaces) it doesn’t have everything to run…

And of course, ArchLinux spells this out:

Note: At present, Arch does not offer a full-system mode and statically linked variant (neither officially nor via AUR), as this is usually not needed.

Yes, “as this is usually not needed”. :)

Updated Setup and Looking Forward

Sort of lobbying ArchLinux to have static QEMU2 what options do I have?

  • set up a systemd service to run the tonistiigi/binfmt container on startup (which is possible)
  • get some static QEMU binaries and override the settings that systemd-binfmt uses
  • switch to anothe Linux Distro that supports the Pi, the software I run, but also ships static QEMU builds

All three are suboptimal, potentially fragile, and the third is way too much work. Still the second one was kinda fine:

cd $(mktemp -d)
docker create --name="tmp_$$"  tonistiigi/binfmt
docker export tmp_$$ -o tonistiigi.tar.gz
docker rm tmp_$$
tar -xf tonistiigi.tar.gz --wildcards "*/qemu-x86_64"
# Copy along the binaries folder:
sudo cp usr/bin/qemu-x86_64 /usr/bin/qemu-x86_64-static

Then just like we’ve overridden the upstream qemu-x86_64.conf we do it again:

  • Copy the config from /usr/lib/binfmt.d/qemu-x86_64.conf to /etc/binfmt.d/qemu-x86_64.conf (make sure the file has the same name to ensure this new file overrides the one from the lib folder)
  • Edit the end of the line from :/usr/bin/qemu-x86_64:FP to :/usr/bin/qemu-x86_64-static:FPOC (that is updating the binary it points at and the flags for good measure too
  • As a bonus, can update the :qemu-x86_64: in the front too, say to :qemu-x86_64-static:, to change the display name of the emulator without affecting any of the functionality, it will just rename the entrin in /proc/sys/fs/binfmt_misc
  • restart systemd-binfmt

Then the check again:

$ cat /proc/sys/fs/binfmt_misc/qemu-x86_64-static
enabled
interpreter /usr/bin/qemu-x86_64-static
flags: POCF
offset 0
magic 7f454c4602010100000000000000000002003e00
mask fffffffffffefe00fffffffffffffffffeffffff

And the alpine-based checks work once more.

Lessons Learned

The details were all in plain sight, but not enough experience to piece these things together. The Docker-recommended image ships its own QEMU? What does that F flag actually do? Can you run binaries while you don’t have them anymore? Dynamic and static linking and the signs of their misbehaviours to provide hints… However this is coupled with confusion when expectations are broken (say the interpreter doesn’t have to refer to an actual file path that exists now), until I started to question my expectations. Also, just being a heavy user of Docker doesn’t mean I’m knowledgeable of the relevant kernel functionality, and probably I should be more…

This whole process underlined my previous thoughts on Software Engineering when AI seems Everywhere, as I did try to debug things by rubber ducking with Claude: this time the hallucinations were through the roof (a metric tonne of non-existent systemd funcionality, non-existent command line flags), definitely got me on a wild goose chase in a few cases. So even more care’s needed, maybe a version of Hofstadter’s Law:

Imreh’s Law3: LLMs are always more wrong than you expect, even when you take into account Imreh’s Law.

In the end, Don’t Panic, make theories and try to prove them, and talk with anyone who listens, even when they are wrong, and you are more likely to get there4.

  1. I’ve download static binaries from andrew-d/static-binaries, recommend strings as something that’s quick and simple to use ./strings /bin/sh | head for example, allowing fast iteration. ↩︎
  2. ArchLinux is x86 by default, for them it would be to emulate linux/arm64, linux/arm/v7, linux/arm/v6 images. For ArchLinux ARM it would be a different work the other direction. If only the main Arch would support ARM, it would be a happier world (even if even more complex). ↩︎
  3. Tongue-in-cheek, of course. ↩︎
  4. And with this we just rediscovered the Feynman Algorithm, I guess. ↩︎
Categories
Admin

There’s a war out there

Since I have set up my little Virtual Private server about two months ago, I keep reading and learning more about its administration. In particular I’m trying to make it more secure, since nobody likes data lost or their things used behind their back. I know that the Internet is a tough place. Most computer users are nicely isolated behind their routers and internal networks, nevertheless I had my freshly installed WinXP being infected in less then 5 minutes when connected to the Net. (Well, since then I don’t install anything Microsoft and first thing to take care is the security, so things are much better).

Thwarting brute force attacks

One of the first thing is securing the remote login access to the machine: disabling root login for SSH is always a good idea. But since I’m interested in cleverer methods, I wanted to do something more potent and general. Found this blog post about how to limit brute-force attack with iptables, so I set out to implement it. The basic idea is that if another computer is trying to connect too many times in short succession, then it is likely an attack. Use the firewall to see how many connections are made in a specific time interval to the sensitive ports and if a threshold is passed then ban that host from connecting for a while. I like it and had to implement it.

The information on the linked page is quite detailed and very useful. Just save the current iptables rules, edit them, and then restore.

# iptables-save > myrules
.... edit them rules ....
# iptables-restore < myrules

For remote servers one thing to be extra careful about is not to block the SSH connections completely: keep the current connection open, try to make a new connection and if you can log in, then things should be fine.

The only thing I have changed compare to the other site is the log level, so i can separate them better. In the following line there was originally --log-level 7 (debug) I’m using --log-level 4 (warning):
-A ATTACKED -m limit --limit 5/min -j LOG --log-prefix "IPTABLES (Rule ATTACKED): " --log-level 4

Then update the line in /etc/syslog.conf to:
kern.warning   /var/log/warnings

Of course this might vary somewhat from Linux distro to distro: the above is for my CentOS install with syslog,

From the logs

Well, not sure if my host was particularly busy or not – I assume it wasn’t since I don’t rank high in Google so fewer attackers would find my little “home”. Still, in the last month there’s a nice little collection of IP addresses which triggered that ATTACKED rule of the firewall.

Using Python I extracted the IP addresses from the logs, run them through GeoIP Python API to get their locations and fed that into the Google Maps Static API, to get this picture:

Location of hosts that triggered my ATTACKED iptables rules
Location of hosts that triggered my ATTACKED iptables rules. Red: once, blue: 2-9 times, yellow: 10+ times

Altogether in about 1 month, I logged 110 ATTACKED triggers from 47 different hosts. Most of them tries only once, there was one that did 48 times. According to GeoIP database, it is from Varna, Bulgaria. Well, if there is one good thing that came out of this, that Varna actually looks quite good and I’d be interested to visit it. :) Talk about strange my reactions to things…

It seems Europe and China are up to no good. Not sure if American baddies are less or just targeting mostly Americans. Might investigate the regional differences some time later. Though this is just for curiosity and fun, if I was serious, then I could set up a proper honeypot.

Some technical notes on making this picture:

  • GeoIP Python API looks one of the worst documented codes I’ve ever seen. I found a tutorial that helped me to get the results I wanted: cities and locations, not just countries.
  • Static maps are quick, dirty and limited. Will try to figure out use the Google Map API for a proper zoomable, scrollable, annotated map. Could imagine making a heat-map of threats, or better colour-coding of the number of attempts from each IP/City.

Anyways, at least there’s no sign of unauthorized entry so far, since most of these attacks are not sophisticated at all. I wonder if I’d recognize if I ever was targeted by a sophisticated attack, but that’s not something to fret over. Just keep the automated backups going and it will be all fine. :D

Update:

The Python script I used to get that map can be found over here.

Categories
Computers Life

New Laptop or You Had Me at “No OS”

I’ve been wanting to upgrade my laptop for quite a while. It was a good ol’ Acer Travelmate 4501wlmi from 2004. I’m not sure why I have kept it for such a long time, maybe I liked torturing myself. In the end the screen was barely hanging on its hinges, the video card memory was corrupt so the screen was all funky sometimes, but what finally did it is the flaky/failing wireless.

Lenovo X201i
Artificially arranged desktop:)

I did check out before what are the acceptable alternatives for a new laptop. Then last weekend I went and got myself a new Lenovo X201i, When I first went to the store, I wasn’t sure whether I’ll get it, or which model to go for. Tried to get some information from the clerk about the available options, but with this communication gap I usually have here in Taiwan, due to my limited Chinese, wasn’t for an advantage. In the end all I did is pretty much confirmed what I have already known: the Lenovo X-series is their smallest ultraportable, they can be quite powerful, and pretty popular. When he asked me what kind of system I wanted and I told him: none, I got a good confirmation that I came to the right place. All other stores the reactions range from apological raised eyebrows to statements that “selling laptops without Windows is illegal” (true story). Here on the other hand, he just got out his “No OS deals” sheet, and I just checked out of the most powerful of them: it had everything I needed and was altogether about 20% cheaper than the other model I was considering before. He was saying that there were only 3 left, so I just galloped off the the nearest ATM, and there I had it, good times.

A few days later I went back to get a few small details sorted out: exchanged to a larger battery (6 to 9 cell), upgraded the memory (2 to 8Gb) and switched the keyboard cover to the right one. This time the limited Chinese was for my advantage. I was talking to a different person this time, who knew even less English than my previous clerk, so whenever the new one was contradicting the deals I was promised, I just had to question it and they gave me the deal, instead of going into any conversation why I couldn’t have it. It’s all fine, I wasn’t abusing this “power”, but not going to be taken advantage of that easily either. All in all, it was quite good deal, even if it would have been cheaper to order it directly from America on the internet.

Experience so far (~5 days):

  • This machine does not compete for any beauty prize, so don’t mind that the 9 cell battery does not improve on that front. It is still okay for me. The matt finish on the cover picks up every touch, so it’s going to be pretty “used” looking soon. The keyboard cover is a good idea, knowing myself, but does not improve things either.
  • It is not really fair to compare it to a computer 6 years its senior, but it’s such a breath of fresh air how snappy it is. Not the most powerful computer I’m using (hard to beat the office’s quad core) but certainly a small powerhouse on the go.
  • The size is just right. Had an EeePC before, and I thought I could get really used to it, but in the end the limitations were just too much. Still got to find a good, small, laptop-enabled backpack, but with its 12″ it shouldn’t be a big deal
  • With the 9-cell I got about 6-7 hours of light use out of it. This is before I did any real power optimization. Linux does have a lot of tricks and even things like sound card power saving can go a long way. Still has to investigate
  • Installed my usual Arch Linux, now with all encrypted filesystem (not that I’m planning to let it be stolen). It will take a while to get my old settings back again, but at least I can organize them better.
  • That ESC key is at some weird place in the corner, keep pressing F1 instead. Even if No OS version (and they saved the “Windows7” sticker) I still have the Windows button. Will try to find some appropriate role for it.
  • Haven’t had a chance to try the WiMAX or the built in camera. The first will probably stay like that, the second I should get going with Skype.
  • Keyboard lighting is ace for nighttime stuff, just like now.
  • The pointing stick does not really like the keyboard cover. It is no big deal, I’m more of a touchpad fan. That touchpad has 5 different buttons but none of them emulates a mouse wheel as far as I can tell. Want to find out what does emulate it, should be very useful. The pad itself acts up sometimes, but nothing too annoying.
  • The 320Gb hard drive is not bad at all, but I’ll look out for a good SSD – should save on power and improve on speed.
  • The screen is a bit picky of the angles it wants to be looked at from. I know the tablet version (X201t) is muc better, this one I just got to live with.
  • Built in fingerprint reader – got to get the drivers working, but it would be awesome to use it for the constant sudo goodness that is required for a well secured system.

Now I have no excuse to be very productive anywhere and everywhere.