Categories
Computers

ARM images to help the Archive Team

Two weeks ago I came across a thread on Hacker News, linking to an announcement of the shutdown of Yahoo! Answers by early May. One of the early comments pointed people to the Archive Team and their project to archive Yahoo Answers before that 4th May deadline. It looked interesting and I gave their recommended tool, the Archive Team Warrior a spin. It runs in VirtualBox, super easy to set up, lightweight, and all around good. Nevertheless after one night keeping my laptop running and archiving things I was wondering if there was a less wasteful way of doing the archiving. In short order I came across the team’s notes on how to run the archiver as Docker containers. That’s more like it: I have some Raspberry Pi devices running at home anyways, if those could be the archiving clients it would make a lot more sense!

While trying out the instructions, it was quickly clear that the Archive Team provided images out of the box were not working on the Pi. Could they be made to work, though? (Spoiler: it did, and you can check the results on GitHub.)

Container ARM-ification

The issue was that the Archive Team-provided project container, atdr.meo.ws/archiveteam/yahooanswers-grab is only available for amd64 (x64-64) machines. Fortunately the source code is available on GitHub and thus could check out the relevant Dockerfile. It uses a specific base image, atdr.meo.ws/archiveteam/grab-base, whose repo reveals that internally it pulls in atdr.meo.ws/archiveteam/wget-lua containing a modified version of wget from yet another repo (adding Lua scripting support, plus a bunch of other stuff added by Archive Team). But fortunately this is the bottom of the stack. And this bottom of the stack is based on debian:buster-slim which does come built as multiarchitecture images, providing arm64, armv7, armv6 versions, the architectures that are relevant for us if building for the Raspberry Pi (and other ARM boards, since why not).