Computers Machine Learning Thinking

Software Engineering when AI seems Everywhere

It’s pretty much impossible to miss the big push to use AI/LLM (Large Language Model) coding assistants for software engineers. Individual engineers, small and large companies seem to be going “all in” on this1. I’m generally wary of things that are this popular, as those often turn out more cargo cult than genuinely positive. So what’s a prudent thing to do as a software engineer? I believe the way ahead is a boring piece of advice, taht applies almost everywhere: instead of going easy, do more of the difficult stuff.

I genuinely think that putting the AI/LLM genie back into the bottle is unlikely (the same way as some people want the Internet, or smartphones, or cryptocurrencies put back into the bottle, which also not really gonna happen). That doesn’t mean that uncritical acceptance of the coding assistant tools should be the norm, au contraire, just like any tool, one needs to discover when they are fit for for the job, and when they are not. I have used GitHub CoPilot for a while, now digging into Cursor as it starts to conquer the workplace, and ChatGPT & Claude for individual coding questions. I don’t think it’s controversial to say that all these tools have their “strengths and weaknesses”, and that currently the more complex, more “production” the problem is, the further away it is from a proof-of-concept, the less likely these tools are of any help. They are help, they can be a large force multiplier, but they are big multiplier when one goes in with the least amount of input (knowledge, awailable time, reqirements for the result…)

Computers Machine Learning Programming

Refreshing Airplane Tracking Software With and Without AI

A bit like last time this post is about a bit of programmer hubris, a bit of AI, a bit of failure… Though I also took away more lessons this time about software engineering, with or without fancy tools. This is about rabbit-holing myself into an old software project that I had very little knowhow to go on…

The story starts with me rediscovering a DVB-T receiver USB stick, that I had for probably close to decade. It’s been “barnacled” by time spent in the Taiwanese climate, so I wasn’t sure if it still works, but it’s such a versatile tool, that it was worth trying to revive it.

When these receivers function, they can receive digital TV (that’s the DVB-T), but also FM radio, DAB, and also they can act as software defined radio (SDR). This last thing makes them able to receive all kinds of transitions that are immediately quite high on the fun level, in particular airplane (ADS-B transmission) and ship (AIS) tracking. Naturally, there are websites to do both if you just want to see it (for example Flightradar24 and MarineTraffic, respectively, are popular aggregators for that data but there are tons), but doing your own data collection opens doors to all kinds of other use cases.

So on I go, trying to find, what software tools people use these days to use these receivers. Mine is a pretty simple one (find out everything about it by following the “RTL-SDR” keywords wherever you like to do that :) and so I remembered there were many tools. However also time passed, I forgot most that I knew, and also there were new projects coming and going.


While I was searching, I found the adsbox project, that was interesting both kinda working straight out of box for me, while it was also last updated some 9 years ago, so it’s an old code base that tickles my “let’s maintain all the things!” drive…

The GitHub repo information of ADSBox, last commits overall have been 9 years ago, and there are very few of them.

The tool is written mostly in C, while it also hosts its own server for a web interface, for listing flights, and (back in the day) supporting things like Google Maps and Google Earth.

The ADSBox interface showing a bunch of airplane information.
The adsbox plane listing interface.

Both the Google Maps and Earth parts seem completely: Maps has changed a lot since, as I also had to update my Taiwan WWII Map Overlays project over time too (the requirement of using API keys to even load the map, changes to the JavaScript API…). Earth I haven’t tried, but I’m thinking that went the way of the dodo on the the desktop?

Computers Machine Learning Programming

Adventures into Code Age with an LLM

It’s a relaxed Saturday afternoon, and I just remembered some nerdy plots I’ve seen online for various projects, depicting “code age” over time: how does your repository change over the months and years, how much code still survives from the beginning till now, etc… Something like this made by the author of curl:

Curl’s code age distribution

It looks interesting and informative. And even though I don’t have codebases that have been around this long, there are plenty of codebases around me that are fast moving, so something like a month (or in some cases week) level cohorts could be interesting.

One way to take this challenge on is to actually sit down and write the code. Another is to take a Large Language Model, say Claude and try to get that to make it. Of course the challenge is different in nature. For this case, let’s put myself in the shoes of someone who says

I am more interested in the results than the process, and want to get to the results quicker.

See how far we can get with this attitude, and where does it break down (probably no spoiler: it breaks down very quickly.).

Note on the selection of the model: I’ve chosen Claude just because generally I have good experience with it these days, and it can share generated artefacts (like the relevant Python code) which is nice. And it’s a short afternoon. :) Otherwise anything else could work as well, though surely with varying results.

Version 1

Let’s kick it off with a quick prompt.

Prompt: How would you generate a chart from a git repository to show the age of the code? That is when the code was written and how much of it survives over time?

Claude quickly picked it up and made me a Python script, which is nice (that being my day-to-day programming language). I guess that’s generally a good assumption these days if one does data analytics anyways (asking for another language is left for another experiment).


Git login and commit signing with security

Doing software engineering (well-ish) is pretty hard to imagine without working in version control, which most of the time means git. In a practical setup of git there’s the question of how do I get access to the code it stores — how do I “check things out”? — and optionally how can others verify that it was indeed me who did the changes — how do I “sign” my commits? Recently I’ve changed my mind about what’s a good combination for these two aspects, and what tools am I using to do them.

Access Options

In broad terms git repositories can be checked out either though the HTTP protocol, or through the SSH protocol. Both have pros and cons.

Having two-factor authentication (2FA) made the HTTP access more secure but also more setup (no more direct username/password usage, rather needing to create extra access keys used in place of passwords). Credentials were still in plain text (as far as I know) on the machine in some git config files.

The SSH setup was in some sense more practical one (creating keys on your own machine, and just passing in the public key portion), though there were still secrets in plain text on my machine (as I don’t think the majority of people used password-protected SSH keys, due to their user experience). This is what I’ve used for years: add a new SSH key for a new machine that I’m working on, check code out through ssh+git, and work away.

When I’ve recently came across the git-credential-manager tool that supposed to make HTTP access nicer (for various git servers and services), and get rid of plain text secrets. Of course this is not the first or only one of the tools that does git credentials, but being made by GitHub, it had some more clout. This made me re-evaulate what options do I have for SSH as well for similar security improvements.

Thus I’ve found that both 1Password and KeePassXC (the two main password managers I use) have ssh-agent integration, and thus can store SSH keys + give access to them as needed. No more plain text (or password protected) private keys on disk with these either!

Now it seems there are two good, new options to evaulate, and for the full picture I looked at how the code signing options work in this context as well.

Admin Computers

ZFS on a Raspberry Pi

I have a little home server, just like mike many other geeks / nerds / programmers / technical people… It can be both useful, a learning experience, as well as a real chore; most of the time the balance is shifting between these two ends. Today I’m taking notes here on one aspect of that home server that is widely swing between those two use cases.

When I say I have a home server, that might be too generous description of the status quo: I have a pretty banged up Raspberry Pi 3B. It’s running ArchLinux ARM, the 64-bit, AAarch64 version, looking a bit more retro on the hardware front while pushing for more modernity on the software side – a mix that I find fun.

There are a handful of services running on the device — not that many, mostly limited by it’s *gulp* 1GB memory; plenty of things I’d love to run, doesn’t well co-locate in such a tiny compartment. Besides the memory, it’s also limited by storage: the Raspberry Pi runs off an SD card, and those are both fragile, and limited in size. If one wants to run a home file server, say using a handful of other SD cards lying around, to expand the available storage, that will be awkward very soon. To make that task less awkward (or replace one kind of awkward with a more interesting one), I’ve set out to set up a ZFS storage pool, using OpenZFS.

The idea

Why ZFS? In big part, to be able to credibly answer that question.

But with a single, more concrete reason: being able to build a more solid and expandable storage unit. ZFS cancombine different storage units

  • in a way that combats data errors, e.g. mirroring: this addresses SD cards fragility
  • in a way that data can expand across all of them in a single file system: this addresses the SD cards size limitations

This sounds great in theory and after a bit of trial-and-error, I’ve made the following setup, relying on dynamic kernel modules for support for flexibility, and a hodgepodge of drives at hand for the storage

The file system supports needs is provided by the zfs-dkms package dynamic kernel module (DKMS), which means the kernel module required for being able to manage that file system is recompiled for each new Linux kernel version as it is updated. This is handy in theory, as I can use the main kernel packages provided by the ArchLinux ARM team.

For storage, I’ve started off with two SD cards in mirror mode (going for data integrity first). Later I’ve found — and invested in — some large capacity USB sticks that bumped the storage size quite a bit. With these, the currentl ZFS pool looks like this:

Terminal screenshot of the 'zpool status' command.

It already saved me — or rather my data — once where an SD card was acting up, though that’s par for the course. One very large benefit is that the main system card is being used less, so hopefully will last longer.

The complications

Of course, it’s never this easy… With non-mainline kernel modules and with DKMS, every update is a bit of a gamble, that can suddenly not pay off. That’s exactly what happened last year, when suddenly the module didn’t compile anymore on a new kernel version, and thus all that storage was sitting dump and inaccessible. After digging into the issue, it down to:

  1. the OpenZFS project being under Common Development and Distribution License (CDDL)
  2. the Linux kernel deliberately breaking non-GPL licensed code by starting to withold certain floating point capabilities, because “this is not expected to be disruptive to existing users”.

This wasn’t great, as user being between pretty much a rock & a hard place, even if this is a hobby and not strictly speaking a production use case on my side.

Nonetheless, it worked by downgrading to a working version and skipping updates to the kernel packages.

Then based on a suggestion, patching the zfs-dkms package (rewriting the license entry in the META file) to make it look like it’s a GPL-licensed module — which is fair game for one doing on their own machine. This is hacky, or let’s call it pragmatic.

--- META.prev   2024-02-28 08:42:21.526641154 +0800
+++ META        2024-02-28 08:42:36.435569959 +0800
@@ -4,7 +4,7 @@
 Version:       2.2.3
 Release:       1
 Release-Tags:  relext
-License:       CDDL
+License:       GPL
 Author:        OpenZFS
 Linux-Maximum: 6.7
 Linux-Minimum: 3.10

Now, with the current 2.2.3 version, it seems like there’s an official fix-slash-workaround for being able to get the module to compile, even if it’s not a full fix. From the linked merge request message I’m not fully convinced that this is not a fragile status quo, but it’s at least front of mind – good going for wider ARM hardware usage that brings out people’s willingness to fix things!

Future development

Some while back, while working at an IoT software deploument & management company, I had a lot of interesting hardware at hand, naturally, to build things with (or wrestle with…). Nowadays I have things I best describe as spare parts, and thus loads of thingss are more fragile than they need to be, as well as gosh-it-takes-a-long-time to compile things on a Raspberry Pi 3 – making every kernel update some half-an-hour longer!

Likely the best move would be to upgrade to a (much more powerful) Raspberry Pi 5 and use an external NVMe drive, where I’d have much less need for ZFS, at least for the original reasons. It would likely be still useful for other aspects (such as snapshotting, or sending/receiving the drive data, compression, deduplication, etc…), changing the learning path away from multi-device support to the file system features.

If I wanted to use more storage in the existing system, I could also get rid of the mirrored SD cards and just just 4 large USB sticks (maybe in a RAIDZ setup), a poor-man’s NAS, I guess. Though there I’d worry a bit about using the sticks with the same sizes for this to work (unlike pooling, which has no same-size requirements), given the differences in the supposedly same sized products from different companies (likely locking me into a having the same brand and model across the board).

I also feel like I’m not using ZFS to its full potential. If I know enough just to be dangerous… maybe that’s the generalists natural habitat?