Categories
Computers Machine Learning Programming

Adventures into Code Age with an LLM

It’s a relaxed Saturday afternoon, and I just remembered some nerdy plots I’ve seen online for various projects, depicting “code age” over time: how does your repository change over the months and years, how much code still survives from the beginning till now, etc… Something like this made by the author of curl:

Curl’s code age distribution

It looks interesting and informative. And even though I don’t have codebases that have been around this long, there are plenty of codebases around me that are fast moving, so something like a month (or in some cases week) level cohorts could be interesting.

One way to take this challenge on is to actually sit down and write the code. Another is to take a Large Language Model, say Claude and try to get that to make it. Of course the challenge is different in nature. For this case, let’s put myself in the shoes of someone who says

I am more interested in the results than the process, and want to get to the results quicker.

See how far we can get with this attitude, and where does it break down (probably no spoiler: it breaks down very quickly.).

Note on the selection of the model: I’ve chosen Claude just because generally I have good experience with it these days, and it can share generated artefacts (like the relevant Python code) which is nice. And it’s a short afternoon. :) Otherwise anything else could work as well, though surely with varying results.

Version 1

Let’s kick it off with a quick prompt.

Prompt: How would you generate a chart from a git repository to show the age of the code? That is when the code was written and how much of it survives over time?

Claude quickly picked it up and made me a Python script, which is nice (that being my day-to-day programming language). I guess that’s generally a good assumption these days if one does data analytics anyways (asking for another language is left for another experiment).

The result is this this code. I’ve skimmed it that it doesn’t just delete all my repo or does something completely batshit, but otherwise saved in a repo that I have at hand. To make it easier on myself, added some inline metadata with the dependencies:

# /// script
# dependencies = [
#   "pandas",
#   "matplotlib",
# ]
# ///

and from there I can just run the script with uv.

First it checked too few files (my repository is a mixture of Python and SQL scripts managed by dbt), so had to go in and change those filters, expanding them.

Then the thought struck me to remove the filter altogether (since it already checks only files that are checked in git, so it should be fine – but then it broke on a step where it reads a file as if it was text to find the line counts. I guess there could be a better way of filtering (say “do not read binary files”, if there’s a way to do that), but just went with catching the problems:

# ....
    for file_path in tracked_files:
        try:
            timestamps = get_file_blame_data(file_path)
            for timestamp in timestamps:
                blame_data[timestamp] += 1
                total_lines += 1
        except UnicodeDecodeError:
            print(f"Error reading file: {file_path}")
            continue
#....

(hance I know that a favicon PNG was causting those UnicodeDecodeError hubbub in earlier runs. Now we are getting somewhere, and we have a graph like this:

Version 1

This is already quite fun to see. There are the sudden accelerations of development, there are the plateaus of me working on other projects, and generally feel like “wow, productive!” (with no facts backing that feeling 😂). Also pretty good ROI on maybe 15 mins of effort.

Having said that, this is still fair from what I wanted.

Version 2

Promt: Could we change the code to have cohorts of time, that is configurable, say monthly, or yearly cohoorts, and colour the chart to see how long each cohort survives?

This came back with another set of code. Adding the metadata, skimming it (it has the filter on the file extensions again, never mind), and running it once more to see the output, I get this:

Version 2

Because of the file extension filter in place, the numbers are obviously not aligning with the above, but it does something. The something is a bit unclear, bit it feels like progress, so let’s give it a benefit of the doubt, and just change once more.

Version 3

Promt: Now change this into a cummulative graph, please.

One more time Claude came back with this code. Adding the metadata again, same drill. Running this has failed with errors in numpy, though:

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Now this needed some debugging. It turns out a column the code is trying to plot is actually numbers as strings rather than numbers as, you know, say floats…

# my "fix"
        df['cumulative_percentage'] = df['cumulative_percentage'].astype(float)
# end

        # Plot cumulative area
        plt.fill_between(df.index, df['cumulative_percentage'],
                        alpha=0.6, color='royalblue',
                        label='Cumulative Code')

It didn’t take too many tries, but it was confusing at first – why shouldn’t be, if I didn’t actually read just skim the code…

The result is then like this:

Version 3

Sort of meh, it feels like it’s not going to the right direction overall.

But while debugging the above issues, I first tried tried to ask Claude about the error (maybe it can fix it itself), but came back with “Your message exceeds the length limit. …” (for free users, that is). So I kinda stopped here for the time being.

Lessons learned

The first lesson is very much re-learned:

Garbage in, garbage out.

If I cannot express what I really want, it’s very difficult to make it happen. And my prompts were by no means expressing my wishes correctly, no wonder Claude wasn’t really hitting the mark. Whether or not a human engineer would have faired better, I don’t know. I know however, that this kind of “tell me exceedingly clearly what’s your idea” is an everyday conversation for me as an engineer (and being on both end of the convo).

The code provided by the model wasn’t really far off for some solution, so that was fun! On the other hand, when it hit any issues, I really had to have domain and language knowledge to fix things. This seems like an interesting place to be:

  • the results are quick and on the surface good-enough for a non/less technical person, probably
  • but they would also be the ones who couldn’t do anything if something goes wrong.

Even myself I feel that it would be hard to support the code as a software engineer if it was just generated like this. But that’s also a strange thought: so many times I have to support (debug, extend, explain, refactor) code that I haven’t had anything to do with before.

It seems to me that now that since Claude comes across as an eager junior engineer, writing decent code that always needs some adjustments, the trade-off is really in the dimension of spending time to get better at prompting vs better at coding.

If there’s a person with some amount of programming skills, mostly interested in the results not the process, and doubling down on prompting: they likely could get loads further than I did here. Good quality prompts and small amount of code adjustments being the sweet spot for them.

For others who have more programming expertise, and maybe more interested in the process, spending time on getting better at programming rather than getting really better at prompting: keeping to smaller snippets might be the sweet spot, or learning new languages, … Something as a starting point for digging in, a seed, is what this process can help with.

Future

Given the above notes on how this generated code is like a new codebase that I suddenly neet to support, here’s a different, fun exercise 💡 to actually improve engineering skills:

Take AI generated code that is “good enough” for a small problem and refactor, extent, productionise it.

I’m not sure if this would work, or would get me into wrong habits, but if I wanted do have some quick ways of doing deliberate practice – and not Exercism, LeetCode, or somilar, rather something that can be custom made, then this seems a way to get started.

Also, now that I’ve gotten even more interested in the problem, I’ll likely just dig into how to actually define that chart I was looking for and what kind of data I would need to get from git to make it happen. The example code made me pretty confident, that “all I need is Python” really, even though while prepping for this I found other useful tools like one allowing you to write SQL queries for your repo, that might be some further way to expand my understanding.

Either way, it’s just fun to mess with code on a lazy Saturday.

Categories
Maker Programming

Making a USB Mute Button for Online Meetings

I use Google Meet every day for (potentially hours of) online meetings at work, so it’s very easy to notice when things change and for example new features are available. Recently I’ve found a new “Call Control” section in the settings that promised a lot of fun, connecting USB devices to control my calls.

Screenshot of the Google Meet Settings menu during calls, showing the call control menu and a call-out to connect my USB device.
Google Meet Settings menu during a call, witht the Call control section

As someone who enjoys (or drawn to, or sort-of obscessed with) hacking on hardware, this was a nice call of action: let’s cobble together a custom USB button that can do some kind of call control1: say muting myself in the call, showing mute status, hanging up, etc.

This kicked off such a deep rabbit hole that I barely made it back up to the top, but one that seeded a crazy amount of future opportunities.

Categories
Programming

Doing the Easy Problems on Leetcode

Over the last decade I seem to have been working in environments, where many engineers and engineering minded people spend time with programming puzzles and coding challenges. Let it be Advent of Code, Project Euler, Exercism, TopCoder, or Leetcode. I’ve tried all of these before (and probably a few more that I no longer remember), though with various amount of time spent all fired up, and then fizzled out. Recently I’ve picked up Leetcode, since from the above list that’s why I’ve spent the least amount of time with and others mentioned using it a way to relax and learn on weekends (suspend judgement on the wisdom of that for now).

Thus in the last two weeks I was solving problems, though not just any problems, but went in mostly for the Easy ones. These few dozen problems and short amount of time doesn’t give me a deep impression, but from past experiences I can still distill some lessons that help shaping future experiments.

The purpose of using the Easy problems is different from e.g. going all in for puzzle-solving fun, which is likely in the Hard ones. Rather than that, I think easy problems can be used for learning some new techniques, looking for common patterns, and becoming more polygot.

Categories
Programming

Programming challenge: Protohackers 3

Protohackers is a server programming challenge, where various network protocols are set as a problem. It has started not so long ago, and the No 3. challenge was just released yesterday, aiming at creating a simple (“Budget”) multi-user chat server. I thought I sacrifice a decent part of my weekend give it a honest try. This is the short story of trying, failing, then getting more knowledge out than I’ve expected.

Definitely wanted to tackle it using Python as that’s my current utility language that I want to know most about. Since the aim of Protohackers, I think, is to go from scratch, I set to use only the standard library. With some poking around documentation I ended up choosing SocketServer as the basis of the work. It seemed suitable, but there was a severe dearth of non-dummy code and deeper explanation. In a couple of hours I did make some progress, though, that already felt exciting:

  • Figured out (to some extent) the purpose of the server / handler parts in practice
  • Made things multi-user with data shared across connections
  • Grokked a bit the lifecycle of the requests, but definitely not fully, especially not how disconnections happen.

Still it was working to some extent, I could make a server that functioned for a certain definition of “functioned”, as the logs attest:

Console log of server messages while trying my Budget Chat Server implementation.
Server logs from trying my Budget Chat Server
Categories
Programming Taiwan

A personal finance data pipeline project

I had received a (family) project brief recently. In Taiwan many credit/debit cards have various promotions and deal, and many of them depend on one’s monthly spending, for example “below X NTD spending each month, get Y% cashback”. People also have a lot of different cards, so playing these off each other can be nice pocket change, but have to keep an eye on whether where one is compared to the max limit (X). So the project comes from here: easy/easier tracking of where one specific card’s spending is within the monthly period. That doesn’t sound too difficult, right? Except the options for these are:

  1. A banking website with CAPTCHAs and no programmatic access
  2. An email received each day with an password-protected PDF containing the last day’s transactions in a table

Neither of these are fully appetizing to tackle, but both are similar to bits that I do at #dayjob, but 2. was a bit closer to what I’ve been doing recently, so that’s where I landed. That is:

  • Forward the received email (the email provider does it)
  • Receive it in some compute environment
  • Decrypt the PDF
  • Extract the transaction data table
  • Clean and process the tabular data
  • Put raw in some data warehouse
  • Transform data to get the right aggregation
  • Literally profit?

I was surprised how quick this actually worked out in the end (if “half a weekend” is quick), and indeed this can be a first piece of a “personal finance data warehouse”.