Categories
Admin Computers

Folding@Home on AWS to kick the arse of coronavirus

Folding@Home popped up on my radar due to a recent announcement that their computational research platform is adding a bunch of projects to study (and ultimately help fight) the COVID-19 virus. Previously I haven’t had any good machine at hand to be able to help in such efforts (my 9 years old Lenovo X201 is still cozy to work with, but doesn’t pack a computing punch). At work, however I get to to be around GPU machines much more, and gave me ideas how to contribute a bit more.

Poking around the available GPU instance types on AWS, seen that there are some pretty affordable ones in the G4 series, going down to as low as roughly $0.60/hour to use some decent & recent CPU and an NVIDIA Tesla T4 GPU. This drops even further if I use spot instances, and looking around in the different regions, I’ve seen available capacity at $0.16-0.20/hour, which feels really in the bargain category. Thus I thought spinning up a Folding@Home server in the cloud on spot instances, to help out and hopefully learning a thing or two, at the price of roughly 2 cups of gourmet London coffee (or taking the tube to work) per day.

Looking at the instance types, there are a few others than the mentioned g4dn.xlarge to choose from, but going to stick with that for the time being:

  • larger g4dn instances don’t really worth it, since the GPU will do the heavy lifting, and it’s the same size until going up to 12xlarge that comes with 4 GPUs, but that’s more than 4x as expensive, so would be rather wasted.
  • Compute optimised p3 instances also don’t seem to particularly worth it, as the difference between its NVIDIA V100 and the T4 is much smaller multiplier than the price difference (based on a quick search for benchmarks: performance is roughly x2, while price of the smallest machine, that’s 2xlarge is x5-6).

Software setup

I’ve spun up an instance simply enough, and with a bit of trial & error got the setup sorted.

Using an Ubunutu system, the required fahclient installed just fine as per the documentation, but the GPU side needed some extra poking, things were unblocked by the NVIDIA drivers and OpenGL packages (thanks to the F@H forums), in my case:

sudo apt install -qy nvidia-headless-435 ocl-icd-opencl-dev

The next was adding a good Folding@Home config, again a bit of trial and error. The docs say lots of the pieces can be left to self-configure (the folding slots in particular), but I’ve found that explicitly setting it works better overall. Thus my /etc/fahclient/config.xml file looks something like this:

<config>
  <!-- Client Control -->
  <fold-anon v='true'/>

  <!-- Folding Slot Configuration -->
  <gpu v='true'/>

  <!-- Slot Control -->
  <power v='full'/>

  <!-- User Information -->
  <passkey v='111111111111111111111'/>
  <team v='xxxxx'/>
  <user v='YYYYYYYYYYYY'/>

  <!-- Folding Slots -->
  <slot id='0' type='CPU'/>
  <slot id='1' type='GPU'/>

  <allow>127.0.0.1 A.B.C.D</allow>
  <web-allow>127.0.0.1 A.B.C.D</web-allow>

  <!-- Remote Command Server -->
  <password v='zzzzzzzzz'/>
</config>

Here I omitted my user name and passkey (naturally), so fill others can fill in their own. I’ve also joined the ArchLinux team (number 45032 ;), but to each of their own. The last part in allow/web-allow section is that I’ve added my VPN’s IP address, so I can connect to the server remotely, without opening it up to the rest of the world. That part (A.B.C.D) can be removed, and could, for example, use SSH port forwarding to connect to the server (forwarding the required port 7396). Finally, the password section allows the remote FAHControl graphical interface to connect to the folding service remotely (without port forwarding).

This setup then got to fold. To ensure that things were running on the GPU fine, I’ve also built nvtop on the machine and checked that the unit is maxed out

nvtop when folding happily

Launch Template

So far it’s fine, but let’s make things more automatic. Spot instances can be killed, or I might want to spin up some extra instances, and would rather have as little manual work to do as possible. What I converged on then is having a Launch Template which sets up all the things needed and I could start a new folder with a couple of clicks. In there I’ve set:

  • the instance type, g4dn.xlarge
  • an Ubuntu 18.04 system
  • the security group, that allows all traffic to my VPN (otherwise port 22 for ssh would be enough with the mentioned ssh tunneling above)
  • that these are spot requests
  • my default AWS keypair for ssh access
  • some tags for housekeeping (definitely optional)
  • user data that does the whole setup on on system start
Launch templating, took 5 versions to converge

Of the parts above, I guess naturally the user data took the most to figure out, because of some peculiarities of the setup.

First, FAHClient keeps wanting to interactively set things up when it is installed, so had to get around that. If I pre-create the correct config.xml file before the install, fortunately only a single question remains (whether it should start the service on automatically) and that one thing is taken care buy a bit of expect scripting.

#!/bin/bash

export DEBIAN_FRONTEND=noninteractive
sudo apt update
sudo apt install -qy nvidia-headless-435 ocl-icd-opencl-dev expect

wget https://download.foldingathome.org/releases/public/release/fahclient/debian-testing-64bit/v7.4/fahclient_7.4.4_amd64.deb
sudo mkdir /etc/fahclient/ || true
sudo chmod 777 /etc/fahclient
cat  <<EOF > "/etc/fahclient/config.xml"
<config>
  <!-- Client Control -->
  <fold-anon v='true'/>

  <!-- Folding Slot Configuration -->
  <gpu v='true'/>

  <!-- Slot Control -->
  <power v='full'/>

  <!-- User Information -->
  <passkey v='111111111111111111111'/>
  <team v='xxxxx'/>
  <user v='YYYYYYYYYYYY'/>

  <!-- Folding Slots -->
  <slot id='0' type='CPU'/>
  <slot id='1' type='GPU'/>

  <allow>127.0.0.1 A.B.C.D</allow>
  <web-allow>127.0.0.1 A.B.C.D</web-allow>

  <!-- Remote Command Server -->
  <password v='zzzzzzzzz'/>
</config>
EOF

cat <<EOF > "/home/ubuntu/install.sh"
#!/usr/bin/expect
spawn dpkg -i --force-confdef --force-depends fahclient_7.4.4_amd64.deb
expect "Should FAHClient be automatically started?"
send "\r"
# done
expect eof
EOF

chmod +x /home/ubuntu/install.sh

sudo /home/ubuntu/install.sh

With this script passed to the instance as user data now it all falls into place, and can spin up new folding any time.

Then there are two ways to connect to the server and monitor it remotely:

  • the web client, on port 7396, with an interface like at the top of this post, or
  • using FAHClient desktop client, that can monitor and control multiple folding instances, and I feel has better control over & more information about what’s being done. This is by default on port 36330 and to work remotely, have to have a “password” set in the configuration.

Using these settings, the remote workload (both the CPU and GPU pops up, and possible to monitor & control:

And this should be done for now…

Notes & Future

Thus far I’ve learned:

  • A bit about spot instances. There are a lot more options which I haven’t touched and might be useful in general, such as targets & instance pools, using the time-limited spot instances, etc, but those are more in general, not in this particular case)
  • A lot about launch templates. They seem handy, though one request would be being able to edit the description of them, or when starting from a previous version, that description is pre-filled (which is currently not, unlike all the other settings).
  • Some apt/dpkg coercion tricks for non-interactive setup, though there seems to be a more to know. How nice it is on ArchLinux that non-interactive mode is basically a single -y flag away in pacman.
  • How to use user data, though that’s definitely just scratching the surface. What would be much better is to learn cloud-init instead, which seems much more like the proper way to supply files to install and scripts to run on these virtual machines.

I’ve also experienced that Folding@Home might be struggling a bit with the current load, earlier today the work servers (but now they seem to be okay), but also the statistics servers, so I’m guessing the whole infrastructure is under load. I wonder how are they set up, and where their bottlenecks are…

But now this is done, the ball is in the court of the research, keep them computational biochemistry research coming. In the meantime keep safe, everyone. Wash hands, not touch faces, and take good care of people around you.

Edit 2020/03/14: Looking at their server stats and connecting up the dots with their project stats, they might have run out of relevant work items for the time being. That’s kinda both good (likely large response to their shout out) and a bummer (resources just idling).

Categories
Life

Five weeks of no-caffeine

Five weeks ago, I’ve started on a “no-caffine month“, well, it wasn’t a month indeed after all. Yesterday was the end of my target, thus let’s see how things worked out!

I’m glad I have managed go all five weeks without caffeine – no coffee, no tea, no dark chocolate to be on the “safe” side… That’s all nice, but looking at the purpose of the decaf period, did it make any material difference? And was it any different compared to the previous two times when I did this? The result is not totally clear cut to me, it definitely was much clearer in previous times. No experiment running for five long weeks in real life can really just change one parameter (me with/without caffeine in this case), while keeping everything else the same. Thus changes or lack of change is harder to interpret.

Categories
Computers

First impressions of Filecoin

I’m an interested user of many novel technologies, some examples being cryptocurrencies and IPFS. One technology that I was keeping an eye on was at the intersection of that two: Filecoin (it’s using blockchain and built on IPFS by the people who made IPFS). It aims to be a decentralized storage network, where nodes are rewarded by storing users’ data, in a programmatic and secure way. After a long wait, the Filecoin repositories just opened up a few days ago (see also the relevant Hacker News discussion). This allowed everyone to give the newly deployed development chain (devnet) a spin, and try out one possible “future-of-storage”. Since the release, I’ve spent a decent handful of hours with Filecoin, and thus gathered a few first impressions.

These are very early stages for the technology, so take all my comments with that nurturing point of view. I’m glad they release stuff at their version 0.0.2 as it happened, even if a lot of things are in flux. Also, I’ve spent a bunch of time with IPFS, a lot of parts of the experience with Filecoin (or rather with the initial implementation of go-filecoin is not as surprising (more familar) to me than likely to someone for the first time a project made by this team. More on this later. Now, in hopefully somewhat logical order...

Getting started

The first thing is obviously getting and installing the binaries for the project. The initial implementation is go-filecoin, which is not totally surprising, one of the two main IPFS implementations is also go-ipfs (the other, for the curious, is js-ipfs, but filecoin does not have a Javascript implementation just yet). As there are no binary releases for go-filecoin just yet, we’ll need to install from source. The project relies on pretty recent Go (1.11.1 or 1.11.2, it’s not clear from the docs and the code…), as well as pretty recent Rust (1.31.0, which is about 2 months old). If the combination of the two is surprising, it’s because some of the heavy lifting libraries was implemented in Rust, for performance reasons (that I think used in the proving that a node actually stores the data that it said it did, without sending the whole data for inspection – aka, the secret sauce of Filecoin).

Categories
Programming

How not to start with machine learning

I’m a technical and scientific person. I’ve done some online courses on machine learning, read enough articles about different machine learning projects, I go through the discussions of those projects on Hacker News, and kept a bunch of ideas what would be cool for the machines to actually learn. I’m in the right place to actually do some project, right? Right? 🚨 Wrong, the Universe says no…

This is the story of how I’ve tried one particular project that seemed easy enough, but leading me to go back a few (a bunch of) steps, and rethink my whole approach.

I bet almost everyone in tech (and a lot of people beyond) heard of AlphaGo, Deepmind’s program to play the game of Go beyond what humans can do. That has evolved, and the current state of the art is Alpha Zero, which takes the approach of starting from scratch, just the rules of the game, and applying self-play, can master games like Go to an even higher level than the previous programmatic champion after relatively brief training (and beating AlphaGo and it’s successor AlphaGo Zero), but also apply to other games (such as chess and shogi). AlphaZero’s self-learning (and unsupervised learning in general) fascinates me, and I was excited to see that someone published their open source AlphaZero implementation: alpha-zero-general. That project applies a smaller version of AlphaZero to a number of games, such as Othello, Tic-tac-toe, Connect4, Gobang. My plan was to learn by adding some features and training some models for some of the games (learn by doing). That sounds much easier to say than to do, and unravelled pretty quickly (but probably not as quickly as it should have been).

Categories
Life

Starting on a no-caffeine month

This is the third time I’m embarking on a quest to decaf. It is usually triggered by observing the effect of all the coffee and tea that I’m drinking: jumpiness, difficulty getting up in the morning, generally being an arse… The signs are pretty unmistakable, that I’ve had a bit too much…

The first time it was a very special experience, many aspects really stayed with me. The most important memory I carry over is the expected way the caffeine withdrawal should play out, based on this first time. The first week is kinda easy. The second is harder. The third was really rough, I reverted to pretty much to be a manchild, (figuratively) banging on the table and shouting “I am a grownup, I can have coffee whenever I want!” Then the last week was very good, much better mood, much more balanced physical functions too. It was bliss. Was almost strange to stay in that state for only a week, and resuming having coffee .