Admin Archives - ClickedyClick

How to Think Differently When Internet Searches Are Metered

Gergely Imreh — Sat, 07 Feb 2026 08:38:17 +0000

A few months ago I’ve tried to up to Kagi, a subscription-based search engine, that I heard a lot of good stuff of. Since I was in a country where their payment system didn’t work yet, I couldn’t actually complete the signup. They’ve generously given a Starter plan me for free while their system was being sorted out (nice, thank you!), however that plan comes with a “300 search per month” limit — which I’ve quickly seen to really matter.

It was around a time when I had a new laptop, and been trying to get things right installing ArchLinux, where both the processes and the hardware changed a lot since I’ve last had done it in… 2011-ish. So that involved many queries to the Internet. So much so, that my 300 searches were done on the 3rd day or so.

So 3 days into a month I already used up my quota, and couldn’t just upgrade to unlimited (yet), since it was a gift from the Kagi. So what could I do different in the future for a better experience?

Habit Changes

What does that better experience really mean? Doing a retrospective of how was I searching, it seemed to me a really mindless, throwaway process:

plopping in keywords, scrolling quickly around, not really clicking on any link necessarily, but changing the word and re-running the search (total shotgun approach)
running the same query again and again across days
relying on the search engine to get slowly changing or unchanging information

… and more. I feel like these are the habits from other search engines, where I didn’t find things, when most pages are “sponsored”, when I didn’t put in effort to check the queries I run just sent them off, and could do because it was “all you can eat”, and that made me not really pay attention to the “taste & flavour” of the results…

Bookmark more & better

One of the most obvious idea I had was that since I go to the same pages all the time, why not save those links in bookmarks?

I used to bookmark a lot, and organise those bookmarks, etc… Then I gave it all up, because … I could just search things? And because organisation wasn’t that simple either. Which folder does this link go to? Is it under Programming > Languages > Python? Or under Professional Development? Or under Useful Libraries?…

The same perfectionist “it doesn’t just need to be bookmarked, it has to be filed away correctly too!” was really not doing me any good service… It took a little while to work through this, and just settle on putting every bookmark in a flat hierarchy, and use tagging to help me find them (rather than folders). Difference being, that a single link can have multiple tags, but can exist in only one folder. I do believe that’s the only scalable way, ever since I was trying to do the other way around (and failing)¹.

So bookmark:

the (book) library catalogues that I borrow books from
the forums and docs pages of projects I’m using²
the useful tools available online, from timezone wrangling to currency conversion…
blogs, publications, web comics that I frequent…

and so on… Now whenever I visit a page, I do stop for a moment and think: could I imagine myself wanting to come back here in the future? If yes, let’s bookmark.

It’s not all perfect, but it’s more about the tools than the process: Firefox on Android doesn’t seem to handle the same “search in bookmarks” shortcuts, or make it more difficult to do. Oh, well, eventually…

Add custom search engines

Sometimes I the page I want to go to is within a large collection, such as a wiki, or a forum. I know it’s there, but unsure where exactly. Bookmarks take me to the page, but then I have to use the search as a second step. This can be made a lot more ergonomic in pretty much every current browser by adding custom search engines³.

Adding Wikipedia’s search? It might already be there. Adding ArchLinux Wiki? It takes 15 seconds to do, and I have a shortcut to so much Linux system admin knowledge that is riddiculous. Whatever site has a search, can be added just as simple.

Here the kicker is to remember that shortcut (and the fact that I’ve added that shortcut), but after that, it’s off to the races.

Change how I search

While I was auditing what sort of searches I like to do, one type that stood out was when I was asking for something like “what’s the website of this-or-that company or project or organisation?” More often than not, these are companies, projects, and organisations that are notable enough to be in Wikipedia. Here’s the new process: search Wikipedia or the company/project/organisation and use the link from their page.

This also feel more to the point than search for “ website” which is just “close enough” in meaning, and will still get me many results, even if the answers should be a single value.

This is caveated by that for programming / open source projects, the better search is probably GitHub / GitLab / CodeBerg, where they are likely hosted (in decreasing probability, currently), and switch to search engine search when that fails.

This is aling the lines that if I already know an authoritive source for the information, I should probably go there directly?

Misc

Bookmarks & custom searches brought down my search count already. One that is more of a housekeeping change is that my browser was reopening pages from my previous session whenever I started it. If I had any Kagi search results open, that just used up another in the quota, and there often there were more than one open… Setting my browser to start afresh on each time I open it helped with that — and also helped with me not being distracted every time I open my browser by whatever I was doing last time, as opposed to what I wanted to do now.

What did I Learn?

Now that I’m on a proper paid plan, I will up it to the Professional plan, where searches are not metered. It doesn’t feel like just a lazy release valve⁴, rather because I don’t believe this sorts of limit to my access information is productive. “Limited” limitations, when there’s a purpose, can indeed be “cretive limitations”.

If I believe that Kagi does a good job, then there’s no point sticking to the quota; if I don’t, then why am I using it in the first place, instead of any of the alternatives?

And if I want to use the power of creative limitations, I can always do my own quest with rules like no search, it’s within my power.

I do feel that the changes to my thinking due to this experience — being more deliberate of what am I looking for and thoughful about where might I find them; choosing and rewarding sources I find useful and reliable; using the little gray cells more — I want to keep and even cultivate. These changes also brought back a more old-school internet vibes (old as in when I bought printed magazines that came with collected links of what you can find on the World Wide Web, something more tangible and purposeful). I guess I’m getting old as well. :)

Tagging also got a big push from and also due to a push from the book Everything is Miscellanious. It’s also why e.g. Gmail was awesome to have tags while other clients were still just doing folders. Fastmail goes even further, lets you choose labels (tags) or folders, which is pretty awesome of them. ︎
Almost every good project has a good forum/docs, or maybe a good forum/docs contributes to the project being good? Here’s looking at you ArchWiki, Obsidian Help,… ︎
For example in Firefox. ︎
As in “if I’m unlimited, I don’t have to care about all the effort I’ve described so far, I can go back to my old habits”. ︎

The post How to Think Differently When Internet Searches Are Metered appeared first on ClickedyClick.

ZFS on a Raspberry Pi

Gergely Imreh — Wed, 28 Feb 2024 11:48:50 +0000

I have a little home server, just like mike many other geeks / nerds / programmers / technical people… It can be both useful, a learning experience, as well as a real chore; most of the time the balance is shifting between these two ends. Today I’m taking notes here on one aspect of that home server that is widely swing between those two use cases.

When I say I have a home server, that might be too generous description of the status quo: I have a pretty banged up Raspberry Pi 3B. It’s running ArchLinux ARM, the 64-bit, AAarch64 version, looking a bit more retro on the hardware front while pushing for more modernity on the software side – a mix that I find fun.

There are a handful of services running on the device — not that many, mostly limited by it’s *gulp* 1GB memory; plenty of things I’d love to run, doesn’t well co-locate in such a tiny compartment. Besides the memory, it’s also limited by storage: the Raspberry Pi runs off an SD card, and those are both fragile, and limited in size. If one wants to run a home file server, say using a handful of other SD cards lying around, to expand the available storage, that will be awkward very soon. To make that task less awkward (or replace one kind of awkward with a more interesting one), I’ve set out to set up a ZFS storage pool, using OpenZFS.

The idea

Why ZFS? In big part, to be able to credibly answer that question.

But with a single, more concrete reason: being able to build a more solid and expandable storage unit. ZFS cancombine different storage units

in a way that combats data errors, e.g. mirroring: this addresses SD cards fragility
in a way that data can expand across all of them in a single file system: this addresses the SD cards size limitations

This sounds great in theory and after a bit of trial-and-error, I’ve made the following setup, relying on dynamic kernel modules for support for flexibility, and a hodgepodge of drives at hand for the storage

The file system supports needs is provided by the zfs-dkms package dynamic kernel module (DKMS), which means the kernel module required for being able to manage that file system is recompiled for each new Linux kernel version as it is updated. This is handy in theory, as I can use the main kernel packages provided by the ArchLinux ARM team.

For storage, I’ve started off with two SD cards in mirror mode (going for data integrity first). Later I’ve found — and invested in — some large capacity USB sticks that bumped the storage size quite a bit. With these, the currentl ZFS pool looks like this:

It already saved me — or rather my data — once where an SD card was acting up, though that’s par for the course. One very large benefit is that the main system card is being used less, so hopefully will last longer.

The complications

Of course, it’s never this easy… With non-mainline kernel modules and with DKMS, every update is a bit of a gamble, that can suddenly not pay off. That’s exactly what happened last year, when suddenly the module didn’t compile anymore on a new kernel version, and thus all that storage was sitting dump and inaccessible. After digging into the issue, it down to:

the OpenZFS project being under Common Development and Distribution License (CDDL)
the Linux kernel deliberately breaking non-GPL licensed code by starting to withold certain floating point capabilities, because “this is not expected to be disruptive to existing users”.

This wasn’t great, as user being between pretty much a rock & a hard place, even if this is a hobby and not strictly speaking a production use case on my side.

Nonetheless, it worked by downgrading to a working version and skipping updates to the kernel packages.

Then based on a suggestion, patching the zfs-dkms package (rewriting the license entry in the META file) to make it look like it’s a GPL-licensed module — which is fair game for one doing on their own machine. This is hacky, or let’s call it pragmatic.

--- META.prev   2024-02-28 08:42:21.526641154 +0800
+++ META        2024-02-28 08:42:36.435569959 +0800
@@ -4,7 +4,7 @@
 Version:       2.2.3
 Release:       1
 Release-Tags:  relext
-License:       CDDL
+License:       GPL
 Author:        OpenZFS
 Linux-Maximum: 6.7
 Linux-Minimum: 3.10

Now, with the current 2.2.3 version, it seems like there’s an official fix-slash-workaround for being able to get the module to compile, even if it’s not a full fix. From the linked merge request message I’m not fully convinced that this is not a fragile status quo, but it’s at least front of mind – good going for wider ARM hardware usage that brings out people’s willingness to fix things!

Future development

Some while back, while working at an IoT software deploument & management company, I had a lot of interesting hardware at hand, naturally, to build things with (or wrestle with…). Nowadays I have things I best describe as spare parts, and thus loads of thingss are more fragile than they need to be, as well as gosh-it-takes-a-long-time to compile things on a Raspberry Pi 3 – making every kernel update some half-an-hour longer!

Likely the best move would be to upgrade to a (much more powerful) Raspberry Pi 5 and use an external NVMe drive, where I’d have much less need for ZFS, at least for the original reasons. It would likely be still useful for other aspects (such as snapshotting, or sending/receiving the drive data, compression, deduplication, etc…), changing the learning path away from multi-device support to the file system features.

If I wanted to use more storage in the existing system, I could also get rid of the mirrored SD cards and just just 4 large USB sticks (maybe in a RAIDZ setup), a poor-man’s NAS, I guess. Though there I’d worry a bit about using the sticks with the same sizes for this to work (unlike pooling, which has no same-size requirements), given the differences in the supposedly same sized products from different companies (likely locking me into a having the same brand and model across the board).

I also feel like I’m not using ZFS to its full potential. If I know enough just to be dangerous… maybe that’s the generalists natural habitat?

The post ZFS on a Raspberry Pi appeared first on ClickedyClick.

Mixing GitLab personal and work accounts: Enterprise Users

Gergely Imreh — Fri, 14 Jul 2023 15:30:23 +0000

TL;DR: if you are about to become a GitLab enterprise user, time to split your work from passion.

I’m often asked by other team members just starting on their version control journey, when using the likes of GitHub and GitLab, whether to have separate accounts for work and personal projects, or have a single one for both?

So far my advice has been pretty much along the lines of: “use a single one“, for many reasons, like every service seems to handle email aliases, git+ssh is pain enough with a single account not even multiple, and people generally seem to build their professional and open source contributions under a single persona anyways.

This advice no longer stands, at least for GitLab. I received this email recently, and how their use of Enterprise Users (and SSO Login + domain verification) makes it absolutely necessary to separate work and personal accounts:

I’ve been looking for a blogpost or other announcement, but couldn’t find one, hence the reposting of it here. I definitely gonna scramble a bit to create some new accounts (and keep my preferred username for the personal one).

Enterprise Users and the extra bit affecting them sorta make sense for “Enterprises”, such that:

This means when requested by an Owner in the top-level of a paid group, information can be shared about, and actions can be made on behalf of an enterprise user.
GitLab Docs on Enterprise Users

Enterprises need different things, that’s for sure.

On the other hand, on GitHub I mostly read to the point of Personal accounts

Most people will use one personal account for all their work on GitHub.com, including both open source projects and paid employment.
GitHub Docs on Personal Users

So for one moment thought that “yeah, they do it better”… except reading on to Enterprise Managed Users it’s pretty much like GitLab — with one difference: they seem to indicate that the given enterprise with create new accounts for the those EMUs, rather than take over their accounts. That might make all the difference, how much of a fuff it is.

Either way, this is just a shout-out, and then I scurry off getting some account splitting sorted. The main benefit is really to separate work from non-work, which occasionally does need a bit of forcing function, even if force feels bad for a short time. Just like having separate work and personal laptops, I consider that right thing while a lot of people do find it annoying. Let’s keep learning the “proper”-ish ways of doing this stuff.

The post Mixing GitLab personal and work accounts: Enterprise Users appeared first on ClickedyClick.

When WordPress caching is not what it seems

Gergely Imreh — Sun, 12 Feb 2023 04:20:21 +0000

When parts of a system are strongly interconnected, one can discover latent issues while debugging something completely different. This is what happened with this blog’s caching and integrating with the Fediverse.

Fediverse adventures

I was part of The Great Twitter Exodus of 2022, and like many I’ve landed on Mastodon (hey, hello, https://fosstodon.org/@imrehg). Mastodon and the whole Fediverse and its build around the ActivityPub protocol is technically very interesting and brings back a bit of retro-joy to me (which needs some reflections on why and how is retro joyful, but another time). This current blog is running WordPress, and soon found that there’s a plugin to turn a WordPress blog into a my own ActivityPub node. That seemed some excellent way to connect up tools and make a more inter-connected Internet (besides nerding out, if I’m fully honest).

ActivityPub plugin

It was super easy to set up, and seemed to have worked well: take my author URL, put it your Mastodon instance’s search, and voila, there’s a compatible profile which one can follow and interact with (to an extent, but still):

How does my author profile look when searched on a Mastodon instance, this time on Fosstodon. One can use either the author URL or the shortened address shown under the profile picture to find a person.

It all seemed to have worked, but coming back after a while, but Site Health popped up a critical issue.

Critical problem showin in the WordPress Site Health page.

Digging in into what the endpoint actually returned when setting what response format the client want to accept:

curl --silent -H "Accept: application/activity+json" https://gergely.imreh.net/blog/author/gergely/

I can see that it indeed doesn’t look like a JSON file, instead the cached web content:

Terminal view of of the response to request to the JSON author page, showing cached HTML instead.

Hot on your trail, W3 Total Cache (W3TC)! I use this plugin to make this site (hopefully) more performant, but it’s not looking great in this instance. Fortunately I got some interesting pointers by asking none other the Fediverse about this issue, and got some helpful pointers.

W3 Total Cache

The way I understood how the W3TC plugin and my configuration worked was the following:

The plugin does some internal caching (opcode, objects,…) using PHP, and something I don’t worry about much.
The main performance benefit is coming from generating page caches on disk that can be loaded quicker then regenerating the page.
Routing to those cached files is mostly through my nginx web server’s configuration: the caching plugin creates an nginx configuration file with the relevant logic and redirects, so on the “happy path” when the target page is cached, the request doesn’t even touch WordPress’ backend at all.

Disabling Caching for an Endpoint

Based on the above (especially point 2), my train of thought started at “Can I tell W3TC not to cache that author endpoint if a specific header is received?” Looking at the W3TC FAQs, there’s indeed a way to signal that, by disabling e.g. page caching when the plugin (ActivityPub in this case), by setting this inside the code path of the relevant page:

define('DONOTCACHEPAGE', true);

Looking at the source code of the ActivityPub plugin, I could find where in the data flow one would set this, before the author template is returned. I tried it out and seemed to have worked. I’ve even opened a GitHub issue so that hopefully a fix can be developed for everyone!

Looking further in the code (to do that fix), though, it’s not only the “author” page, but silently also all the posts and the front page has the same issue (of getting cached HTML when asking for JSON). If one disables caching on author + blogposts + frontpage, what else of note is left cached? Nothing really. And the plugin owner agrees.

Route Request Based on Headers

Let’s try instead routing the request based on headers: if a compatible “Accept” header is received, bypass the cache, and use what he endpoint returns. Here comes the issue about the 3rd point above: the nginx configuration and its use.

Ideally I would be able to add to the config a tests for this header, a check, roughly along the lines of this in nginx config::

if ( $http_accept = 'application/activity+json' ) {
 # switch vars not to cache
}

Following the installation steps, when Page Caching is enabled in W3TC generates an extra nginx.conf file that governs what happens. There are a lot of various checks (e.g. do not cache on POST requests, do not rewrite paths if the predicted cache file location is empty….) I’ve been following the generated file, and tried to adjust the caching behaviour. To debug, I turned to adding various headers to the response, as it was easier than messing with the nginx logging rules. For example adding this just before the last rewrite rule kicks in would show the variable’s value that would decide if a rewrite happens:

add_header X-gergely-debug "rewrite-$w3tr_rewrite";

Looking at this, in every situation I was getting “no rewrite to do” outcomes, and it wasn’t in the end too much a surprise, as the logic in that file seems to be flawed: generating on disk file locations from the request incorrectly and thus never finding anything (and/or some of my misconfiguration? But there are definitely things which look plain wrong).

But while the nginx rules shrugged, in the same time my web requests were returning the files from disk! If I rewrote the generated files, I got the modified version back. Then, frustrated, even emptied out nginx.conf to try to “break” caching – and it continued to work!

So I guess the actual behaviour was different from the above 3. point, and rather:

W3TC generates nginx rules and hope that they work and take load off the WordPress backend
If that doesn’t fly, generate the caches internally in the plugin (of course, have to be able to do that on first requests/preloading anyways)
The plugin still checks internally (in code) whether the cached file exists where it expects one on the disk and loads that, bypassing this above point!

I haven’t verified this yet by looking at the code, but this explains all the behaviour I’ve seen while trying things (serving files that I’ve manually changed and working while having an empty nginx rewrite).

So after all, these checks for the Accept header would need to be both in the nginx config (less important for me as it was already broken), and also in the code of W3TC (which feels currently less tractable).

Current and Future Fixes

What I’ve ended up with is the simple and dumb way for now: disable Page Caching altogether.

For my site and level of traffic that should definitely not be a serious issue, though I did check with a site speed test just in case, and I don’t see much difference for my otherwise semi-broken setup.

Still, it’s definitely a step forward not assuming & hoping things are working (ie. nginx-based chaching) when they are not.

What would be good for the future, though, is a pattern where cache plugins only cache what they can and if not pass it on to the rest of the processing (ie. check if the client “Accept” header is empty or defaults to some browser-y value). I bet this is a naive view, and there are more complications. There likely should be more complications if WordPress is more of a “platform”, given then it will have to support a lot more different use cases and behaviours.

Future Rabbitholes

I’m definitely not alone in this quest, there are others who hit various ActivityPub + caching issues (e.g. using CloudFlare CDN). With More Mastodon usage there might be some more satisfying solutions than “disable most caching”.

I’ve definite learned more about WordPress Plugin internals by looking at the ActivityPub plugin’s repo. I’m sure I could pick up some stuff for the 100 Days to Offload plugin in the future.

During debugging I was also looking at my web server logs which I haven’t done for years, but I know would be a “proper” sysadmin thing to do. There were a lot of interesting queries that I want to follow up on (bots, sites, tools scaraping the blog and interacting with various bits). It’s the Internet, after all, so let’s look at the connections made!

The post When WordPress caching is not what it seems appeared first on ClickedyClick.

Folding@Home on AWS to kick the arse of coronavirus

Gergely Imreh — Fri, 13 Mar 2020 22:13:53 +0000

Folding@Home popped up on my radar due to a recent announcement that their computational research platform is adding a bunch of projects to study (and ultimately help fight) the COVID-19 virus. Previously I haven’t had any good machine at hand to be able to help in such efforts (my 9 years old Lenovo X201 is still cozy to work with, but doesn’t pack a computing punch). At work, however I get to to be around GPU machines much more, and gave me ideas how to contribute a bit more.

Poking around the available GPU instance types on AWS, seen that there are some pretty affordable ones in the G4 series, going down to as low as roughly $0.60/hour to use some decent & recent CPU and an NVIDIA Tesla T4 GPU. This drops even further if I use spot instances, and looking around in the different regions, I’ve seen available capacity at $0.16-0.20/hour, which feels really in the bargain category. Thus I thought spinning up a Folding@Home server in the cloud on spot instances, to help out and hopefully learning a thing or two, at the price of roughly 2 cups of gourmet London coffee (or taking the tube to work) per day.

Looking at the instance types, there are a few others than the mentioned g4dn.xlarge to choose from, but going to stick with that for the time being:

larger g4dn instances don’t really worth it, since the GPU will do the heavy lifting, and it’s the same size until going up to 12xlarge that comes with 4 GPUs, but that’s more than 4x as expensive, so would be rather wasted.
Compute optimised p3 instances also don’t seem to particularly worth it, as the difference between its NVIDIA V100 and the T4 is much smaller multiplier than the price difference (based on a quick search for benchmarks: performance is roughly x2, while price of the smallest machine, that’s 2xlarge is x5-6).

Software setup

I’ve spun up an instance simply enough, and with a bit of trial & error got the setup sorted.

Using an Ubunutu system, the required fahclient installed just fine as per the documentation, but the GPU side needed some extra poking, things were unblocked by the NVIDIA drivers and OpenGL packages (thanks to the F@H forums), in my case:

sudo apt install -qy nvidia-headless-435 ocl-icd-opencl-dev

The next was adding a good Folding@Home config, again a bit of trial and error. The docs say lots of the pieces can be left to self-configure (the folding slots in particular), but I’ve found that explicitly setting it works better overall. Thus my /etc/fahclient/config.xml file looks something like this:


  
  

  
  

  
  

  
  
  
  

  
  
  

  127.0.0.1 A.B.C.D
  127.0.0.1 A.B.C.D

Here I omitted my user name and passkey (naturally), so fill others can fill in their own. I’ve also joined the ArchLinux team (number 45032 ;), but to each of their own. The last part in allow/web-allow section is that I’ve added my VPN’s IP address, so I can connect to the server remotely, without opening it up to the rest of the world. That part (A.B.C.D) can be removed, and could, for example, use SSH port forwarding to connect to the server (forwarding the required port 7396). Finally, the password section allows the remote FAHControl graphical interface to connect to the folding service remotely (without port forwarding).

This setup then got to fold. To ensure that things were running on the GPU fine, I’ve also built nvtop on the machine and checked that the unit is maxed out

nvtop when folding happily

Launch Template

So far it’s fine, but let’s make things more automatic. Spot instances can be killed, or I might want to spin up some extra instances, and would rather have as little manual work to do as possible. What I converged on then is having a Launch Template which sets up all the things needed and I could start a new folder with a couple of clicks. In there I’ve set:

the instance type, g4dn.xlarge
an Ubuntu 18.04 system
the security group, that allows all traffic to my VPN (otherwise port 22 for ssh would be enough with the mentioned ssh tunneling above)
that these are spot requests
my default AWS keypair for ssh access
some tags for housekeeping (definitely optional)
user data that does the whole setup on on system start

Launch templating, took 5 versions to converge

Of the parts above, I guess naturally the user data took the most to figure out, because of some peculiarities of the setup.

First, FAHClient keeps wanting to interactively set things up when it is installed, so had to get around that. If I pre-create the correct config.xml file before the install, fortunately only a single question remains (whether it should start the service on automatically) and that one thing is taken care buy a bit of expect scripting.

#!/bin/bash

export DEBIAN_FRONTEND=noninteractive
sudo apt update
sudo apt install -qy nvidia-headless-435 ocl-icd-opencl-dev expect

wget https://download.foldingathome.org/releases/public/release/fahclient/debian-stable-64bit/v7.6/fahclient_7.6.9_amd64.deb
sudo mkdir /etc/fahclient/ || true
sudo chmod 777 /etc/fahclient
cat  < "/etc/fahclient/config.xml"

  
  

  
  

  
  

  
  
  
  

  
  
  

  127.0.0.1 62.212.77.217
  127.0.0.1 62.212.77.217
  

EOF

# This new FAHClient version might not get GPUs.txt properly, load it
curl https://apps.foldingathome.org/GPUs.txt  --create-dirs -o /var/lib/fahclient/GPUs.txt
sudo chmod -R 755 /var/lib/fahclient

cat < "/home/ubuntu/install.sh"
#!/usr/bin/expect
spawn dpkg -i --force-confdef --force-depends fahclient_7.6.9_amd64.deb
expect "Should FAHClient be automatically started?"
send "\r"
# done
expect eof
EOF

chmod +x /home/ubuntu/install.sh

sudo /home/ubuntu/install.sh

With this script passed to the instance as user data now it all falls into place, and can spin up new folding any time.

Then there are two ways to connect to the server and monitor it remotely:

the web client, on port 7396, with an interface like at the top of this post, or
using FAHClient desktop client, that can monitor and control multiple folding instances, and I feel has better control over & more information about what’s being done. This is by default on port 36330 and to work remotely, have to have a “password” set in the configuration.

Using these settings, the remote workload (both the CPU and GPU pops up, and possible to monitor & control:

And this should be done for now…

Notes & Future

Thus far I’ve learned:

A bit about spot instances. There are a lot more options which I haven’t touched and might be useful in general, such as targets & instance pools, using the time-limited spot instances, etc, but those are more in general, not in this particular case)
A lot about launch templates. They seem handy, though one request would be being able to edit the description of them, or when starting from a previous version, that description is pre-filled (which is currently not, unlike all the other settings).
Some apt/dpkg coercion tricks for non-interactive setup, though there seems to be a more to know. How nice it is on ArchLinux that non-interactive mode is basically a single -y flag away in pacman.
How to use user data, though that’s definitely just scratching the surface. What would be much better is to learn cloud-init instead, which seems much more like the proper way to supply files to install and scripts to run on these virtual machines.

I’ve also experienced that Folding@Home might be struggling a bit with the current load, earlier today the work servers (but now they seem to be okay), but also the statistics servers, so I’m guessing the whole infrastructure is under load. I wonder how are they set up, and where their bottlenecks are…

But now this is done, the ball is in the court of the research, keep them computational biochemistry research coming. In the meantime keep safe, everyone. Wash hands, not touch faces, and take good care of people around you.

Edit 2020/03/14: Looking at their server stats and connecting up the dots with their project stats, they might have run out of relevant work items for the time being. That’s kinda both good (likely large response to their shout out) and a bummer (resources just idling).

Edit 2020/04/19: Updated the template user data script to install the the newer 7.6.9 version of FAHClient (instead of 7.4.4), which also needs manually loading the GPUs.txt file, because it doesn’t seem to do that by itself…

The post Folding@Home on AWS to kick the arse of coronavirus appeared first on ClickedyClick.

Continuous integration testing of Arch User Repository packages

Gergely Imreh — Thu, 12 Apr 2018 11:56:40 +0000

I maintain a couple of ArchLinux user-contributed packages on the Arch User Repository (AUR), and over time I’ve built out a bit of infrastructure around that to make that maintenance easier (and hopefully the results better). The core of it is automated building of packages in Continuous Integration, which catches a number of issues which otherwise would be more difficult.

This write-up will go through the entire packaging process to make it easily reproducible.

Contributing a package

AUR is a great resource for Arch Linux users, and it is pretty easy to create and contribute new packages.

Packages are created by cloning an empty git repository with the desired package name. I do it in a slightly different setup compared to the wiki that’s linked just above, as:

git clone ssh+git://aur@aur.archlinux.org/.git

Add your PKGBUILD and any other required files, run mksrcinfo, and git commit, and push… If everything went well, your package is now visible in the AUR search.

Next time that repository is cloned, it will contain the code, and changes (i.e. package updates) can be pushed just as well too.

Keeping track of packages

As more packages are contributed, it is increasingly hard to keep track of them as separate repositories. One way to improve on this, is creating a “meta” repository (or repo), where all the contributed packages are linked as git submodules.

This organization is achieved by creating your meta-repo, and add your package as a submodule:

git submodule add ssh+git://aur@aur.archlinux.org/.git

Then you’d make package updates in that submodule, and the meta repo would contain all your packages as a collection.

My packages’ meta repo that show this arrangement is on Github at imrehg/aur.

Continuous integration testing

What we can do with this setup now, is to automatically check out, build, analyze, and test (including installation) of the all the packages. I’ve set that up as CircleCI build jobs for each of the packages: each of them built and installed in a clean Arch Linux environment.

The clean Arch Linux environment is provided by a Docker image, that I’ve created for this purpose, archlinux-makepkg-docker. That image builds on an upstream Arch Linux image, and sets a few things up:

updates the image with the latest base build system
creates a “builder” user that can run sudo
installs two packages from scratch that are sometimes needed for working with AUR packages: “package-query” and “yaourt”
installs “namcap” to analyze the package

Each AUR package is set up with its own CircleCI build job as part of a workflow.

Since most of the work for each package is pretty much the same, we can simplify things with templates, such as this:

# Common sections
defaults: &defaults
  working_directory: ~/aur
  docker:
    - image: imrehg/archlinux-makepkg

updatepackage: &updatepackage
  name: Update packages
  command: sudo pacman -Syu --noconfirm

gitupdate: &gitupdate
  name: Git repo updates
  command: |
    sed -i "s#ssh+git://aur@aur.archlinux.org#https://aur.archlinux.org#" .gitmodules
    git submodule update --init
pkgbuildtest: &pkgbuildtest
  name: Testing PKGBUILD
  command: |
    cd ~/aur/${CIRCLE_JOB}
    namcap PKGBUILD
buildtest: &buildtest
  name: Building package
  command: |
    cd ~/aur/${CIRCLE_JOB}
    makepkg -sci --noconfirm

# Main
version: 2
jobs:
  my-package:
    <<: *defaults
    steps:
      - run:
          <<: *updatepackage
      - checkout
      - run:
          <<: *gitupdate
      - run:
          <<: *pkgbuildtest
      - run:
          <<: *buildtest

workflows:
  version: 2
  build:
    jobs:
      - my-package

The sample CircleCI “config.yml” here is set up to build an AUR package called “my-package”:

it pulls the Arch Linux Docker image mentioned earlier
updates any outdated OS package
checks out meta repo that we are working from
updates the submodule configuration to be able to pull the required submodule without authentication. the “ssh+git://” setup requires the maintainer’s SSH credentials, while switching to “https://” the CI environment is allowed to check the package’s code out (and won’t be able to push back upstream, which is safer)
runs “namcap” on the PKGBUILD to catch any obvious issues
builds and installs the package (including dependencies)

As “my-package” is set up above, it does not have any line specific in to that package in the build steps. The specifics are set up using CircleCI variables (CIRCLE_JOB) and YAML Merge Key Language-Independent Types (the “foo: &foo” and “<< : *foo” section). Thus if there’s “another-package”, it’s easy to clone the “my-package” section as it is, naming that “another-package”, and adding a new build job to the end of the file called “another-package”. With this “templating” when the build steps need to be modified, they can be updated in the header, and all the packages will pick that up.

Workflows are also useful, as jobs can be made dependent on each other, if they are related, such as my “gnushogi” and “xshogi” packages, or likely any AUR package that requires other AUR packages that need to be built.

...
workflows:
  version: 2
  build:
    jobs:
      
      - gnushogi
      - xshogi:
          requires:
            - gnushogi

This would result in a dependency in the jobs as:

Jobs in the CircleCI workflow

The workflows also allow for jobs to give files to each other. E.g. as above “xshogi” depends on “gnushogi” to be installed, I could build all the required dependencies again in “xshogi”, but it was already built, I could just pass on the created package from the earlier job to the next, using CircleCI workspaces.

  gnushogi:
    <<: *defaults
    steps:
      - run:
          <<: *updatepackage
      - checkout
      - run:
          <<: *gitupdate
      - run:
          <<: *pkgbuildtest
      - run:
          <<: *buildtest
      - persist_to_workspace:
          root: gnushogi
          paths: gnushogi-*.pkg.tar.xz

  xshogi:
    <<: *defaults
    steps:
    <<: *defaults
    steps:
      - run:
          <<: *updatepackage
      - checkout
      - run:
          <<: *gitupdate
      - run:
          <<: *pkgbuildtest
      - attach_workspace:
          at: /tmp/workspace
      - run:
          name: Installing gnushogi
          command: sudo pacman -U --noconfirm /tmp/workspace/gnushogi*.pkg.*
      - run:
          <<: *buildtest

The meta repo is now ready to go with such “.circleci/config.yml”, and on each push, it will build all the packages defined in the job list. You can check how the results look for my AUR packages in CircleCI’s build job view (one entry by build job, ie. package-per-push) or workflow view (one entry per push, aggregating all jobs).

Last build workflows

One of the advantages of this setup, is that if a build fails on any of the package (e.g. a source file is no longer available) it’s easy to see, and I can catch a number of out-of-date packages sooner than someone reports it on AUR.

Keeping the build image up to data

The Arch Linux Docker image is automatically built on Docker Hub (and can be found at imrehg/archlinux-makepkg. It is kept fresh by an If This Than That applet, which triggers the build every morning.

IFTTT Applet to trigger Docker Hub automated builds

That applet just uses the Date & Time and Webhooks recipes. The webhook points to the Trigger URL provided by the “Build Settings / Build Triggers” section on Docker Hub for the image, and it’s a POST request with payload of:

{"docker_tag": "latest"}

Docker HUB Build Settings / Build Triggers settings

Keeping the image fresh like this shortens the build time when running the jobs on CircleCI (fewer packages need to be updated), which especially important as free users have limited CPU time available each month.

Not many packages which use other AUR packages, which likely need more setup here.

Update workflow

As an aside, the process to update any given package with this setup as follows:

Update the “PKGBUILD” for the package, quite often it’s just the version number
Update the checksums easily with “updpkgsums” (part of “pacman” so it should be always available)
Build the package
If everything goes well, update the required “.SRCINFO” with “mksrcinfo” (part of “pkgbuild-introspection”)
git add, commit (signed if you can:), and push to AUR
Clean up the package directory (“git clean -d -f && rm -rf src”)
Going back up in the folder hierarchy to the meta repo git add and commit the changes to the submodules
Push to github, and enjoy the build!

Future

Many things can be improved on this setup (one day), here are some ideas

It should be possible to publish the build artifacts to somewhere (say S3) and set it up as a custom Arch Linux package repository, thus can be reused without everyone needing to build from scratch every time.

If that publishing would happen, I’m guessing it would be good to also sign the built packages, which might be a bit trickier to set up safely, but would make package distribution nicer and more robust.

In my list of packages there are not that many that depend on other AUR packages. Other packages with more AUR dependencies might need even more custom setup than shown above, besides the templated sections, to make them speedy and logical.

In the package testing steps, probably should run “namcap” on the finished package too, to catch other issues (e.g. dependencies required but not included).

What’s your experience with maintaining AUR packages, or with CircleCI? Have any feedback on how to make this above even more useful?

The post Continuous integration testing of Arch User Repository packages appeared first on ClickedyClick.

Personal phone server, or Can you hear me now?

Gergely Imreh — Tue, 18 Mar 2014 09:41:02 +0000

Ever since someone donated an IP phone to the Taipei Hackerspace, I’m trying to find time to set up an internal phone network between the hackerspace members. It should be fun to make our own infrastructure. Recently did some research, and started with it. Since if I get into something then I dive deep for a while, this was an intense week. This post is to summarize where I have got in this time

Asterisk & FreePBX

A bit of searching turned up Asterisk, a PBX (“private branch exchange” aka telephone network) software. It looked interesting because it came with a story: a guy building something awesome because he doesn’t know that it was supposed to be difficult. It’s also open source from the start, with a successful company build on top of the project.

Also found, that there’s a graphical control panel called FreePBX that makes using the all-command-line-and-config-files Asterisk easier to use. Both projects had a seemingly very detailed wiki, long track record, and strong following that made it worth checking them out.

The Server

Judging from the original install instructions on the FreePBX wiki, it looked like installing Asterisk & FreePBX is a complex (or rather many-step) process. Didn’t want to litter my own computer with broken installation artefacts, so enter VirtualBox. Using a virtual machine makes it easy to wipe and restart.

There’s a dedicated, preinstalled FreePBX distro based on CentOS, but had enough of CentOS for a while. Instead I just took Ubuntu 12.04.3 as a base, and FreePBX 2.11 and Asterisk 11 from the wiki. The install instructions were clear enough, though occasionally there were small differences needing a fix. Nothing major, but had to play around. After 5-6 reinstalls with increasing experience I got the basic functionality working, calls placed between and such, but the performance and sound quality wasn’t really that good. After thinking what could I improve, decided to take the next step: get out of the virtual machine, and up the version numbers (I’m Arch Linux user with a reason, living on the bleeding edge).

Enter DigitalOcean, a hosting provider that I used for other projects before (cheap, fast with SSD, good service). Set up a machine (aka “droplet”) in their Singaporean center (since that’s probably the closest one to Taiwan). I chose the 1Gb memory instance, because from experience with VirtualBox Asterisk+FreePBX maxed out at around that with a few test accounts.

Upping the version numbers I went with FreePBX 12.0 (from git) and Asterisk 12.1.1 (from download), both are testing versions. Asterisk had an extra dependency of libjansson-dev compared to version 11, didn’t check if any of the earlier dependencies are not required anymore.

FreePBX interface for Asterisk

Got the whole system working (after a few droplet wipes), and played with the installation with a bit more confidence. From initial experience, Asterisk is a bit like the Linux Kernel. It’s modular, complex, focus on reliability, and the “make menuselect” is a familiar environment after years of “make menuconfig” compiling my own kernel. On the other hand, FreePBX is a bit like WordPress. It has its own auto-updater (just like updating plugins in WordPress), loads and loads of menus, focuses on configuration and tries not to let any faulty module take down the system (found quite a few buggy behaviour, so that’s a good idea). The Kernel and WordPress are two familiar environments, so felt home here too somehow.

Asterisk has a bunch of vocabulary that I’m so far barely familiar with, and lots of functionality that I haven’t had a chance to test yet. FreePBX has a lot of functionality too, and still it’s a bit difficult for me to tell where does an Asterisk function (module, resource?) end and one FreePBX function (plugin?) start. The fact is that I got to feel excited about programmable phone routing (with Lua), fax-to-pdf, hotel style wake-up calls, voicemail recording, call tracing, speaking time, simple conference talks, intercom functionality, regardless from whether it’s a module or a plugin…

Some additional server notes: voicemail requires email out for notification, I set that up with Mandrill and postfix. For such testing it might not be important, but good to secure the server at least a bit with fail2ban and ufw (Uncomplicated Firewall), and probably other things I don’t do well yet. Just sayin’.

Accounts / Extensions

Accounts on the server are the extensions on which someone (or something), a numerical value. The vocabulary and concepts are also new to me, so it took a while to understand how things supposed to interoperate. Asterisk has a bunch of different kinds of extensions, of which I have tried two main ones: SIP and IAX.

SIP

SIP stands for Session Initiation Protocol. As far as I see it is basically a messaging protocol, to set up a connection between two parties, and also provide some other services, for example presence information (Online, Away, Busy….), messaging, and what not. The actual data of the call (voice or video) is trhough RTP (Real-time Transport Protocol).

The voice data in the transmission is compressed with one of the many codecs available:

ulaw and alaw (G.711) are a pair of the standard codecs, okay quality, one of the basic one to have in any client
speex is a variable bitrate codec, haven’t used that much
gsm is lower bitrate, but lower quality too (think of crappy cell phone reception voice)
G.722 is a hi-def (HD) voice codec, really good! I think beats Skype, and on par with a good Google Hangout quality,
G.729 is a non-free codec, shows up here and there, but haven’t had a chance to try it, this is the other HD codec that I’ve seen recommended

In testing, this was some of the learning curve, how to set up clients, and also the server that they can communicate with each other. Who choses the codec (caller, callee, server)? How to prioritize the codecs in different clients? What does it look like (or sounds like) when there’s a problem in this area? How to debug and fix?

Asterisk 12 has two different SIP channels or components: their classic library (chan_sip), and a rewritten one (chan_pjsip). The latter one is a standalone library that can be used for other purposes as well. SIP usually works on UDP, while PJSIP can do UDP/TCP/WebSockets too, and feels stable and fast. Definitely would use that if I have to choose between these two. Still, it is in test phase (both in Asterisk and FreePBX), so not without headaches.

There are bunch of different clients that I tried:

On Linux:

Ekiga is nice, simple, can sign into multiple accounts in multiple networks. Presence information, well integrated into the desktop with notifications and such. Does not seem to be able to handle non-standard SIP ports (which will be an issue further down)
Linphone is really multiplatform (Linux, Win, OSX, smartphones…), but it was crashing on me quite a bit, doesn’t integrate into the desktop (no notification just sound on call), and can be confusing with the lot of settings (the control panel looks a mess). Can handle non-standard ports too.
SFLPhone is good, works pretty well, simple, and can do IAX communication besides SIP.

On Android:

Android actually has full SIP handling capabilities built in for a while now (under “internet phone”). That would be awesome, if there were more information how to set up and use, but in theory a SIP account can be fully integrated into the system.
CSIPSimple really impressed me, probably the best working client I found. Integrates with the system (calls are handled as ‘calls’ with all the icons, history, and so on), good sound quality (can use the G.722 codec) and so on.
SipDroid, VIMPhone, LinPhone…. these other clients, don’t even remember them, all of them fell short somehow
Zoiper stands out as well, not just because it’s multiplatform, but because it’s pretty much the only one I found that can do both SIP and IAX. The Android system integration is not as close as CSIPSimple, but quite okay.

One of (the many) good thing about SIP that it is well known and pretty well supported. If there’s a “softphone” (phone in software), it’s quite likely to have SIP communication capabilities.

The bad thing about SIP though that it is well known and pretty well resented by the phone service providers. Many of those providers block SIP messages on their network, or sabotage the connection in some other way. On my own cell phone / 3G provider’s network, I couldn’t connect to Asterisk. In the forums some suggested that changing the port number that Asterisk listens on for SIP connections can solve things – and indeed after moving away from 5060/5061 to somewhere else, I could connect. The celebration was short lived, though because even though the calls now can reach the destination, RTP communication (the part that actually transports voice) was still broken. I don’t want to use VPN all the time (though might need to soon), and want to keep moving parts and settings to the minimum if I want others in the Hackerspace to join this network as well, thus SIP looks like a no-go because of the phone companies (darn).

IAX

Looking around, I found another type of channel, using the IAX (Inter-Asterisk eXchange) protocol. The bad thing about it that it is much less supported, but in turn it is not blocked by the phone companies either (since they don’t know about it).

Using SFLPhone and Zoiper I could successfully talk over 3G! Still it is not all good, the devil is in the details.

Zoiper incoming call on Android

It’s good that no need for custom ports
It’s bad that the IAX channel seems to be more unstable on Asterisk (or maybe I messed up my install after a while?): some extensions have trouble logging on for a while until the server is restarted; the wakeup-calls plugin misbehaved with IAX extension.
The less support also means less choice in clients. The ones I found cannot do G.722 so no HD voice anymore
Has a security setting (requirecalltoken) that not all clients support, not sure if there are any implications.

It’s also good, that Asterisk can route incoming SIP calls onto IAX extensions (i.e. the caller doesn’t have to care what technology the callee is using). On the note of routing, I could set things up such that outside calls can be routed into the system. E.g. every hackerspace could have their own Asterisk server and interoperate to call members at other spaces – sounds like a lot of work and might not worth it, but it also sounds awesome.

Summary & Future

I had a lot of fun playing with Asterisk. On the surface phone networks are familiar to everyone, but going deeper both makes things more confusing and opens my eyes how many possibilities there are for making something useful.

There are a lot of things that I thought about, but haven’t tried yet:

Programmable dialplans (“what happens when a call is received”), via Lua. Lua is an awesome language and probably a lot more large piece of software has it embedded (since that’s one of its strength)
Could script a lot more too via the Asterisk Gateway Interface (AGI).
There are a bunch of other protocols and acronyms in Asterisk, for example Secure Real-Time Protocol (SRTP) and ZRTP, that could worth figuring out for a deeper understanding and security
There’s an Asterisk on Raspberry Pi project that looks interesting (if nothing else then how do they lower the memory usage below the RPi’s <512MB?). Since Asterisk can be used with multiple servers in a network, the RPi can provide one kind of service (e.g. GSM gateway) while other servers with more resources do other stuff
Using physical phones in the network, for example traditional phone network to come into the server, and IP Phone to ring out. Maybe setting up fax endpoint (and sending it out as PDF or printing it). Basically anything that is working on the threshold between physical and digital.
Should check out how the likes of Line, Voxer, and Viber are doing VoIP on Android, do they have any interoperability?
How about Twillio, can their system be a similar PBX on a much larger scale?

The funny thing is that looks like the original IP phone that started this whole adventure does not work with Asterisk. Never mind, it will be good for another project.

The post Personal phone server, or Can you hear me now? appeared first on ClickedyClick.

Switched to SPDY and now Google’s confused

Gergely Imreh — Sun, 26 May 2013 13:44:48 +0000

Out of interest, I recently switched this site to SPDY, party because I like to try out new things, and partly because I would want to make things be better and faster. So far it’s a mixed experience, with some puzzling changes, that I cannot make heads or tails of.

The first step for the switch was bringing everything onto HTTPS, which I have done with a free SSL certificate from StartSSL. Redirected everything from the HTTP to the secure connection, with the 301 http code so I thought Google will be able to follow it well and replace the addresses in their index. Then enabled the SPDY module in Nginx, and checking the result looked like I was in business.

Some time has passed, and a scary graph started to manifest itself in Google Analytics:

Google Analytics impression count, the site has changed around May 8.

Right after I have made the changes, my impression count on Google dropped like a brick, now being exactly 0. That’s not really the change I wanted to see. Digging more into it, though, it looks like I still have a constant stream of visitors from Google Search:

Visitor numbers from Google Search, same time interval as the impression count.

How can I have zero impressions, but still a half a dozen visitors from Search? The results in the Webmaster Tools mirror things: dropping impression count, no crawl errors, same or even better indexed count, and relatively good stats:

Google Crawler stats, with a big spike when switched over HTTPS/SPDY when needed to reindex everything

The crawl seemed to have gotten a bit slower (the bottom plot of the three), but more consistent.

I wonder what could be the change, does the impression count depend on the method of access (http/https)? Or did I made some braking changes? If so, then why’s the conflicting information?

Being a scientist, my main concern is not actually the raw value of any visitor count, but understanding the reactions to my actions, and consistency of the “experimental results”. I wonder what kind of technique I could use to debug all this?

Update 2013/May/28:

Following some recommendations from the comments, it looks like that the https:// version of my URL has to added to the Webmaster Tools separately. Now there’s a http://gergely.imreh.net and a https://gergely.imreh.net section as well. In the latter section, I can see that there are some impressions reported. Some weird things still exist: the sum of impressions from both is less than how many visitors I reportedly get from Google Search; the crawl stats is shared between the two sections (ie. the https version reports a lot of crawl stats even from the time there wasn’t https enabled), while most other data is separate for the two sections (e.g. impression, search queries, sitemaps). Still probably this is on the right path.

The impression count after adding a https version of my site’s records to the Webmaster Tools

After the Webmaster Tools changes, I have just switched the Google Analytics association from one WMT property to the other. Hopefully this will freak me out less, though it will likely take some days to see the changes in the result.

The post Switched to SPDY and now Google’s confused appeared first on ClickedyClick.

Fighting forum spam

Gergely Imreh — Sun, 27 Jan 2013 02:16:33 +0000

As one of the managers of Ignite Taipei, I’m trying to come up with new ways to let the community communicate, new ways to share information, advice and all. A while ago I have set up a forum at http://bbs.ignitetaipei.tw/ and I thought that will be an interesting experiment. Well, so far it is useless for communication, but turned out to be a very interesting experience from the sysadmin point of view.

I used FluxBB, because it looked simple enough, seemed to be quite fast (for low traffic volume at least), and well configurable. Except that within a very short time I run into a spam problem, so many fake users registered, and lots of algorithmically generated garbage text with a bit of advertisement here and there.

First I looked into FluxBB’s own solutions, and looks like it might not have been a great choice, because many of the spam-fighting plugins are out of date, or not supported anymore, or just a real pain to set up. The immediate practical step I could take was updating my security questions, roll my own version of “written with words, how much is 5 + 4?”, the regular low-tech captcha on FluxBB. Looks like the original answers are already in the database everywhere, so had to write my own set, which seemed to work for a while, cutting down on red-flagged registrations. But it’s not ideal, since I want to make this a dual-language forum (Ignite Taipei has both English & Chinese as official language).

Instead I turned on email confirmation. When someone registers, the password is sent to their email and have to use that to sign in. It was okay for a tiny bit, then crazy registration boom happened. I think I might be the only one real member of the board (I said that it is a failure so far for communication:) and there are 500 other spam members. Looking at their email addresses, it seems all of them have Hotmail. That kinda suggests a giant failure at Hotmail to restrict automatic registration, which is probably a problem overall. I cannot just throw out Hotmail addresses either, because it’s a popular mail provider here in Taiwan too (my first email was Hotmail too, but that was a looooong time ago, before it was Microsoft property).

So captcha don’t work, email don’t work. What to do instead? At the time I was playing around with Cloudflare, to act as an easy to use CDN. I tried it before for our Ignite Taipei blog, which is hosted on Tumblr, and that doesn’t play well with Cloudflare unfortunately. Couldn’t use it for this blog before because of my DNS provider, but now I switched, so started playing with it again.

Cloudflare stats snapshot (parts of it)

Instead of enabling Cloudlfare for the entire ignitetaipei.tw domain, just turned it on for the forum, since it’s hosted elsewhere. And that totally did it. Spam stopped that very moment, and haven’t returned since. I think what happens is that Cloudflare knows globally a lot of web/forum/email span hosts, and can challenge them or generally ignore them. Can even see where those spammers are coming from.

Cloudflare Threats Console

One weird (but actually not that surprising) thing is that the most active web crawler on the site (Cloudflare gives that info as well) was Baidu by far, so I guess more people knew about the site in China than elsewhere. Why’s that? Some forums that share vulnerable sites, or something like that? I barely had any Chinese content at that time, so it cannot be that. And since I turned on the threat control part, Baidu seem to have dropped quite a bit (submitted the site to Google so now that’s the busiest crawler).

All in all, Cloudflare is an interesting experiment. I can really mess up my DNS with it, and could blocked my own site for several hours, but in general it worth it. Just have to be careful. For example when testing, use their own name servers to check the information, and maybe instead if “automatic” time-to-live, set some very short time first. I usually use Google’s 8.8.8.8, and they pick up the first wrong setting really quickly, then it takes hours to pick up the correction I made just minutes after the first one.

After a bit of playing around, at least I have no spam anymore (keep fingers crossed). Now just have to get people to use the forums. :)

The post Fighting forum spam appeared first on ClickedyClick.

Laboratory 2.0 – a monitoring system

Gergely Imreh — Sun, 28 Oct 2012 14:03:15 +0000

Looks like that one of my specialty as a physicist, and contribution to the labs where I have worked so far, is bringing different kinds of programming techniques, and technologies to the table. I’m not saying I’m any better than many of the professors, post-docs, and students I’ve met so far (there are plenty of ingenious ones), it’s more like I experiment with different tools, have tried more of the cutting edge or recent technologies, did some web programming and could whip up something quick – that might not work very well at first, but does broaden the horizon for the rest of the people.

Also, I’m a lazy person, so want to automate as much as possible. That was on my mind recently when we have been preparing to do a vacuum-system bake-out. It’s essentially a procedure to have a delicate experimental system, mostly made up of steel, glass, and stuff like that, closed up from the atmosphere, all the air pumped out, then heated up to high temperature (~150-300°C). One has to be careful, because things can break, there are temperature limitations for some materials, also on how quickly that temperature can change, requiring careful monitoring of the status of the system. And the whole thing takes something like two weeks or more. Perfect setting for automation.

Set up the electronics

The pressure measurements are done by some expensive other equipment so didn’t have to bother with that one yet, so set to work first on the temperature monitoring. Before it was a bunch of thermocouples and multimeters, requiring manual intervention and lots of labour. Instead, got some inspiration from Adafruit’s Thermocouple Breakout Board, using the MAX31855 chip, and also from the Thermocouple Multiplexer Shield. It can handle only one channel, but can use some other chip together with it to switch between the different thermocouples, and so we can read it out one-by-one. The Adafruit board could only handle 1 channel, and the multiplexer shield was using an older chip for the measurement that I could not buy anymore. In the end, found a good analog multiplexer that one that is sold in the computer market here in Taipei, the CD4067B, and it works pretty well.

Breadboard setup for temperature monitoring with Arduino

Of course, setting it all up was quite a bit of fun times, as there were way too many gotchas along the way.

MAX31855 is a surface-mount component, and haven’t worked with it before. Not too bad, and can be much neater, just takes some plactice
MAX31855 is a 3.3V circuit, so the CMOS voltage levels used by my Arduino Mega ADK had to be level shifted
Unlike the older chip, MAX31855 really needs differential input, and it’s much more sensitive to the environment. This required different kind of analog multiplexer than that board had
The Arduino Mega is a new model for me, and had some strange behaviour in terms of the serial communication
Surprisingly there are not too many options for 3.3V voltage regulators over here, just the LM1117, which is different from what others are using elsewhere
Lots of noise and stability issues until figured out what should be how. For example under no circumstance should touch the thermocouple to conducting surfaces, and avoid ground loops
While MAX31855 says it’s “cold-point compensated”, meaning that it accounts for the chip-s local temperature when measuring the thermocouple, it doesn’t appear completely compensated, meaning that we can have unexpected measurement change because the chip is heating up for example by being in a closed box.
Figuring out the right amount of time to wait between switching channels (375ms seems to be good enough, 500ms is totally fine)

In the end, though, we did have a nice 16 channel thermocouple multiplexer, sending off the measurements onto an LCD screen and to the computer over an USB cable.

Temperature monitoring board in it’s lab setting with 16 thermocouple channels

This is then saved in a database, and can be accessed from elsewhere.

Visualize!

The thing that my co-workers were most amazed by wasn’t the electronics. Sure, they haven’t worked with Arduinos, but did do similar stuff. Instead they liked the monitoring interface much more, this is the one on the picture right here (can click to enlarge)

Bakeout Monitor interface (click image for full view)

It’s the schematic layout of our equipment, with the temperatures positioned where the actual sensors are. Also, the change of the measured values in time are also displayed with live scrolling.

I’m not saying it’s great. Thinking about it, the major insight that made it good for the rest of the people is that I realized how much more people understand visual data: the placement of the values to the corresponding locations on the schematics. That’s the only thing.

So inside it’s a MongoDB database (learned from previous mistakes, using a replica-set at least), with Python scripts talking to the sensors and saving the data, NodeJS / Smoothie Charts for visualization (and plain old CSS positioning of tags for the reading display), nginx‘s upstream module for running two monitoring servers just in case. It’s mostly in the Github repo of the monitoring code, as well as the Arduino sketch for talking to the electronics.

It was actually quite fun to write it all, and the gradual improvements, trying the new tech, trying not to lose to much data, amazed how well it works. Especially had a good time learning about the database, scaling, fault tolerance, performance…

Of course there could be room for a lot more improvements.

My failover-restart bash scripts are awful, though they do seem to work more or less and counteract the USB unreliablilities
There were some changes to Smoothie Charts that I could improve on: logarithmic plotting, some display enhancements, wonder if it can be more optimized for performance
More efficient data loading. 12h data is about 30Mb in JSON format, that I send compressed, apparently it gets down to ~5% in size, but it still takes quite a bit of time to process on the frontend
The layout now can be changed from config files if the sensors change, so co-workers can do that without programming knowledge. I wonder if that can be simplified even more

Of course, I’m a person who generally overengineers stuff, so maybe it’s good to stop somewhere. And the somewhere might be when I got to the point to use my Kindle for monitoring (craps out on 1h data already, but some real time things are good enough).

Bakeout Monitor on running on Kindle 3, not perfect but does work

Get on with it

I did learn a lot along the way, and I’m sure that with this experience I will be let to do a little bit more in the lab in terms of programming ideas. I don’t like that the rest of the system is currently forced to be LabView, but that’s for another post, and there are so many things that can be improved in general as well. Let’s just go and do that.

The post Laboratory 2.0 – a monitoring system appeared first on ClickedyClick.