Categories
Computers

ARM images to help the Archive Team

Two weeks ago I came across a thread on Hacker News, linking to an announcement of the shutdown of Yahoo! Answers by early May. One of the early comments pointed people to the Archive Team and their project to archive Yahoo Answers before that 4th May deadline. It looked interesting and I gave their recommended tool, the Archive Team Warrior a spin. It runs in VirtualBox, super easy to set up, lightweight, and all around good. Nevertheless after one night keeping my laptop running and archiving things I was wondering if there was a less wasteful way of doing the archiving. In short order I came across the team’s notes on how to run the archiver as Docker containers. That’s more like it: I have some Raspberry Pi devices running at home anyways, if those could be the archiving clients it would make a lot more sense!

While trying out the instructions, it was quickly clear that the Archive Team provided images out of the box were not working on the Pi. Could they be made to work, though? (Spoiler: it did, and you can check the results on GitHub.)

Container ARM-ification

The issue was that the Archive Team-provided project container, atdr.meo.ws/archiveteam/yahooanswers-grab is only available for amd64 (x64-64) machines. Fortunately the source code is available on GitHub and thus could check out the relevant Dockerfile. It uses a specific base image, atdr.meo.ws/archiveteam/grab-base, whose repo reveals that internally it pulls in atdr.meo.ws/archiveteam/wget-lua containing a modified version of wget from yet another repo (adding Lua scripting support, plus a bunch of other stuff added by Archive Team). But fortunately this is the bottom of the stack. And this bottom of the stack is based on debian:buster-slim which does come built as multiarchitecture images, providing arm64, armv7, armv6 versions, the architectures that are relevant for us if building for the Raspberry Pi (and other ARM boards, since why not).

Categories
Admin Computers

Folding@Home on AWS to kick the arse of coronavirus

Folding@Home popped up on my radar due to a recent announcement that their computational research platform is adding a bunch of projects to study (and ultimately help fight) the COVID-19 virus. Previously I haven’t had any good machine at hand to be able to help in such efforts (my 9 years old Lenovo X201 is still cozy to work with, but doesn’t pack a computing punch). At work, however I get to to be around GPU machines much more, and gave me ideas how to contribute a bit more.

Poking around the available GPU instance types on AWS, seen that there are some pretty affordable ones in the G4 series, going down to as low as roughly $0.60/hour to use some decent & recent CPU and an NVIDIA Tesla T4 GPU. This drops even further if I use spot instances, and looking around in the different regions, I’ve seen available capacity at $0.16-0.20/hour, which feels really in the bargain category. Thus I thought spinning up a Folding@Home server in the cloud on spot instances, to help out and hopefully learning a thing or two, at the price of roughly 2 cups of gourmet London coffee (or taking the tube to work) per day.

Categories
Computers

First impressions of Filecoin

I’m an interested user of many novel technologies, some examples being cryptocurrencies and IPFS. One technology that I was keeping an eye on was at the intersection of that two: Filecoin (it’s using blockchain and built on IPFS by the people who made IPFS). It aims to be a decentralized storage network, where nodes are rewarded by storing users’ data, in a programmatic and secure way. After a long wait, the Filecoin repositories just opened up a few days ago (see also the relevant Hacker News discussion). This allowed everyone to give the newly deployed development chain (devnet) a spin, and try out one possible “future-of-storage”. Since the release, I’ve spent a decent handful of hours with Filecoin, and thus gathered a few first impressions.

These are very early stages for the technology, so take all my comments with that nurturing point of view. I’m glad they release stuff at their version 0.0.2 as it happened, even if a lot of things are in flux. Also, I’ve spent a bunch of time with IPFS, a lot of parts of the experience with Filecoin (or rather with the initial implementation of go-filecoin is not as surprising (more familar) to me than likely to someone for the first time a project made by this team. More on this later. Now, in hopefully somewhat logical order...

Getting started

The first thing is obviously getting and installing the binaries for the project. The initial implementation is go-filecoin, which is not totally surprising, one of the two main IPFS implementations is also go-ipfs (the other, for the curious, is js-ipfs, but filecoin does not have a Javascript implementation just yet). As there are no binary releases for go-filecoin just yet, we’ll need to install from source. The project relies on pretty recent Go (1.11.1 or 1.11.2, it’s not clear from the docs and the code…), as well as pretty recent Rust (1.31.0, which is about 2 months old). If the combination of the two is surprising, it’s because some of the heavy lifting libraries was implemented in Rust, for performance reasons (that I think used in the proving that a node actually stores the data that it said it did, without sending the whole data for inspection – aka, the secret sauce of Filecoin).

Categories
Programming

How not to start with machine learning

I’m a technical and scientific person. I’ve done some online courses on machine learning, read enough articles about different machine learning projects, I go through the discussions of those projects on Hacker News, and kept a bunch of ideas what would be cool for the machines to actually learn. I’m in the right place to actually do some project, right? Right? 🚨 Wrong, the Universe says no…

This is the story of how I’ve tried one particular project that seemed easy enough, but leading me to go back a few (a bunch of) steps, and rethink my whole approach.

I bet almost everyone in tech (and a lot of people beyond) heard of AlphaGo, Deepmind’s program to play the game of Go beyond what humans can do. That has evolved, and the current state of the art is Alpha Zero, which takes the approach of starting from scratch, just the rules of the game, and applying self-play, can master games like Go to an even higher level than the previous programmatic champion after relatively brief training (and beating AlphaGo and it’s successor AlphaGo Zero), but also apply to other games (such as chess and shogi). AlphaZero’s self-learning (and unsupervised learning in general) fascinates me, and I was excited to see that someone published their open source AlphaZero implementation: alpha-zero-general. That project applies a smaller version of AlphaZero to a number of games, such as Othello, Tic-tac-toe, Connect4, Gobang. My plan was to learn by adding some features and training some models for some of the games (learn by doing). That sounds much easier to say than to do, and unravelled pretty quickly (but probably not as quickly as it should have been).

Categories
Admin

Continuous integration testing of Arch User Repository packages

I maintain a couple of ArchLinux user-contributed packages on the Arch User Repository (AUR), and over time I’ve built out a bit of infrastructure around that to make that maintenance easier (and hopefully the results better). The core of it is automated building of packages in Continuous Integration, which catches a number of issues which otherwise would be more difficult.

This write-up will go through the entire packaging process to make it easily reproducible.