Categories
Admin Computers

Folding@Home on AWS to kick the arse of coronavirus

Folding@Home popped up on my radar due to a recent announcement that their computational research platform is adding a bunch of projects to study (and ultimately help fight) the COVID-19 virus. Previously I haven’t had any good machine at hand to be able to help in such efforts (my 9 years old Lenovo X201 is still cozy to work with, but doesn’t pack a computing punch). At work, however I get to to be around GPU machines much more, and gave me ideas how to contribute a bit more.

Poking around the available GPU instance types on AWS, seen that there are some pretty affordable ones in the G4 series, going down to as low as roughly $0.60/hour to use some decent & recent CPU and an NVIDIA Tesla T4 GPU. This drops even further if I use spot instances, and looking around in the different regions, I’ve seen available capacity at $0.16-0.20/hour, which feels really in the bargain category. Thus I thought spinning up a Folding@Home server in the cloud on spot instances, to help out and hopefully learning a thing or two, at the price of roughly 2 cups of gourmet London coffee (or taking the tube to work) per day.

Categories
Admin

Continuous integration testing of Arch User Repository packages

I maintain a couple of ArchLinux user-contributed packages on the Arch User Repository (AUR), and over time I’ve built out a bit of infrastructure around that to make that maintenance easier (and hopefully the results better). The core of it is automated building of packages in Continuous Integration, which catches a number of issues which otherwise would be more difficult.

This write-up will go through the entire packaging process to make it easily reproducible.

Categories
Admin Maker

Personal phone server, or Can you hear me now?

Ever since someone donated an IP phone to the Taipei Hackerspace, I’m trying to find time to set up an internal phone network between the hackerspace members. It should be fun to make our own infrastructure. Recently did some research, and started with it. Since if I get into something then I dive deep for a while, this was an intense week. This post is to summarize where I have got in this time

Asterisk & FreePBX

A bit of searching turned up Asterisk, a PBX (“private branch exchange” aka telephone network) software. It looked interesting because it came with a story: a guy building something awesome because he doesn’t know that it was supposed to be difficult. It’s also open source from the start, with a successful company build on top of the project.

Also found, that there’s a graphical control panel called FreePBX that makes using the all-command-line-and-config-files Asterisk easier to use. Both projects had a seemingly very detailed wiki, long track record, and strong following that made it worth checking them out.

The Server

Judging from the original install instructions on the FreePBX wiki, it looked like installing Asterisk & FreePBX is a complex (or rather many-step) process. Didn’t want to litter my own computer with broken installation artefacts, so enter VirtualBox. Using a virtual machine makes it easy to wipe and restart.

There’s a dedicated, preinstalled FreePBX distro based on CentOS, but had enough of CentOS for a while. Instead I just took Ubuntu 12.04.3 as a base, and FreePBX 2.11 and Asterisk 11 from the wiki. The install instructions were clear enough, though occasionally there were small differences needing a fix. Nothing major, but had to play around. After 5-6 reinstalls with increasing experience I got the basic functionality working, calls placed between and such, but the performance and sound quality wasn’t really that good. After thinking what could I improve, decided to take the next step: get out of the virtual machine, and up the version numbers (I’m Arch Linux user with a reason, living on the bleeding edge).

Enter DigitalOcean, a hosting provider that I used for other projects before (cheap, fast with SSD, good service). Set up a machine (aka “droplet”) in their Singaporean center (since that’s probably the closest one to Taiwan). I chose the 1Gb memory instance, because from experience with VirtualBox Asterisk+FreePBX maxed out at around that with a few test accounts.

Upping the version numbers I went with FreePBX 12.0 (from git) and Asterisk 12.1.1 (from download), both are testing versions. Asterisk had an extra dependency of libjansson-dev compared to version 11, didn’t check if any of the earlier dependencies are not required anymore.

The information control panel of FreePBX.
FreePBX interface for Asterisk

Got the whole system working (after a few droplet wipes), and played with the installation with a bit more confidence. From initial experience, Asterisk is a bit like the Linux Kernel. It’s modular, complex, focus on reliability, and the “make menuselect” is a familiar environment after years of “make menuconfig” compiling my own kernel. On the other hand, FreePBX is a bit like WordPress. It has its own auto-updater (just like updating plugins in WordPress), loads and loads of menus, focuses on configuration and tries not to let any faulty module take down the system (found quite a few buggy behaviour, so that’s a good idea). The Kernel and WordPress are two familiar environments, so felt home here too somehow.

Asterisk has a bunch of vocabulary that I’m so far barely familiar with, and lots of functionality that I haven’t had a chance to test yet. FreePBX has a lot of functionality too, and still it’s a bit difficult for me to tell where does an Asterisk function (module, resource?) end and one FreePBX function (plugin?) start. The fact is that I got to feel excited about programmable phone routing (with Lua), fax-to-pdf, hotel style wake-up calls, voicemail recording, call tracing, speaking time, simple conference talks, intercom functionality, regardless from whether it’s a module or a plugin…

Some additional server notes: voicemail requires email out for notification, I set that up with Mandrill and postfix. For such testing it might not be important, but good to secure the server at least a bit with fail2ban and ufw (Uncomplicated Firewall), and probably other things I don’t do well yet. Just sayin’.

Accounts / Extensions

Accounts on  the server are the extensions on which someone (or something), a numerical value. The vocabulary and concepts are also new to me, so it took a while to understand how things supposed to interoperate. Asterisk has a bunch of different kinds of extensions, of which I have tried two main ones: SIP and IAX.

SIP

SIP stands for Session Initiation Protocol. As far as I see it is basically a messaging protocol, to set up a connection between two parties, and also provide some other services, for example presence information (Online, Away, Busy….), messaging, and what not. The actual data of the call (voice or video) is trhough RTP (Real-time Transport Protocol).

The voice data in the transmission is compressed with one of the many codecs available:

  • ulaw and alaw (G.711) are a pair of the standard codecs, okay quality, one of the basic one to have in any client
  • speex is a variable bitrate codec, haven’t used that much
  • gsm is lower bitrate, but lower quality too (think of crappy cell phone reception voice)
  • G.722 is a hi-def (HD) voice codec, really good! I think beats Skype, and on par with a good Google Hangout quality,
  • G.729 is a non-free codec, shows up here and there, but haven’t had a chance to try it, this is the other HD codec that I’ve seen recommended

In testing, this was some of the learning curve, how to set up clients, and also the server that they can communicate with each other. Who choses the codec (caller, callee, server)? How to prioritize the codecs in different clients? What does it look like (or sounds like) when there’s a problem in this area? How to debug and fix?

Asterisk 12 has two different SIP channels or components: their classic library (chan_sip), and a rewritten one (chan_pjsip). The latter one is a standalone library that can be used for other purposes as well. SIP usually works on UDP, while PJSIP can do UDP/TCP/WebSockets too, and feels stable and fast. Definitely would use that if I have to choose between these two. Still, it is in test phase (both in Asterisk and FreePBX), so not without headaches.

There are bunch of different clients that I tried:

On Linux:

  • Ekiga is nice, simple, can sign into multiple accounts in multiple networks. Presence information, well integrated into the desktop with notifications and such. Does not seem to be able to handle non-standard SIP ports (which will be an issue further down)
  • Linphone is really multiplatform (Linux, Win, OSX, smartphones…), but it was crashing on me quite a bit, doesn’t integrate into the desktop (no notification just sound on call), and can be confusing with the lot of settings (the control panel looks a mess). Can handle non-standard ports too.
  • SFLPhone is good, works pretty well, simple, and can do IAX communication besides SIP.

On Android:

  • Android actually has full SIP handling capabilities built in for a while now (under “internet phone”). That would be awesome, if there were more information how to set up and use, but in theory a SIP account can be fully integrated into the system.
  • CSIPSimple really impressed me, probably the best working client I found. Integrates with the system (calls are handled as ‘calls’ with all the icons, history, and so on), good sound quality (can use the G.722 codec) and so on.
  • SipDroid, VIMPhone, LinPhone…. these other clients, don’t even remember them, all of them fell short somehow
  • Zoiper stands out as well, not just because it’s multiplatform, but because it’s pretty much the only one I found that can do both SIP and IAX. The Android system integration is not as close as CSIPSimple, but quite okay.

One of (the many) good thing about SIP that it is well known and pretty well supported. If there’s a “softphone” (phone in software), it’s quite likely to have SIP communication capabilities.

The bad thing about SIP though that it is well known and pretty well resented by the phone service providers. Many of those providers block SIP messages on their network, or sabotage the connection in some other way. On my own cell phone / 3G provider’s network, I couldn’t connect to Asterisk. In the forums some suggested that changing the port number that Asterisk listens on for SIP connections can solve things – and indeed after moving away from 5060/5061 to somewhere else, I could connect. The celebration was short lived, though because even though the calls now can reach the destination, RTP communication (the part that actually transports voice) was still broken. I don’t want to use VPN all the time (though might need to soon), and want to keep moving parts and settings to the minimum if I want others in the Hackerspace to join this network as well, thus SIP looks like a no-go because of the phone companies (darn).

IAX

Looking around, I found another type of channel, using the IAX (Inter-Asterisk eXchange) protocol. The bad thing about it that it is much less supported, but in turn it is not blocked by the phone companies either (since they don’t know about it).

Using SFLPhone and Zoiper I could successfully talk over 3G! Still it is not all good, the devil is in the details.

Phone interface showing incoming call from IAX test user
Zoiper incoming call on Android

  • It’s good that no need for custom ports
  • It’s bad that the IAX channel seems to be more unstable on Asterisk (or maybe I messed up my install after a while?): some extensions have trouble logging on for a while until the server is restarted; the wakeup-calls plugin misbehaved with IAX extension.
  • The less support also means less choice in clients. The ones I found cannot do G.722 so no HD voice anymore
  • Has a security setting (requirecalltoken) that not all clients support, not sure if there are any implications.

It’s also good, that Asterisk can route incoming SIP calls onto IAX extensions (i.e. the caller doesn’t have to care what technology the callee is using). On the note of routing, I could set things up such that outside calls can be routed into the system. E.g. every hackerspace could have their own Asterisk server and interoperate to call members at other spaces – sounds like a lot of work and might not worth it, but it also sounds awesome.

Summary & Future

I had a lot of fun playing with Asterisk. On the surface phone networks are familiar to everyone, but going deeper both makes things more confusing and opens my eyes how many possibilities there are for making something useful. 

There are a lot of things that I thought about, but haven’t tried yet:

  • Programmable dialplans (“what happens when a call is received”), via Lua. Lua is an awesome language and probably a lot more large piece of software has it embedded (since that’s one of its strength)
  • Could script a lot more too via the Asterisk Gateway Interface (AGI).
  • There are a bunch of other protocols and acronyms in Asterisk, for example Secure Real-Time Protocol (SRTP) and ZRTP, that could worth figuring out for a deeper understanding and security
  • There’s an Asterisk on Raspberry Pi project that looks interesting (if nothing else then how do they lower the memory usage below the RPi’s <512MB?). Since Asterisk can be used with multiple servers in a network, the RPi can provide one kind of service (e.g. GSM gateway) while other servers with more resources do other stuff
  • Using physical phones in the network, for example traditional phone network to come into the server, and IP Phone to ring out.  Maybe setting up fax endpoint (and sending it out as PDF or printing it). Basically anything that is working on the threshold between physical and digital.
  • Should check out how the likes of Line, Voxer, and Viber are doing VoIP on Android, do they have any interoperability?
  • How about Twillio, can their system be a similar PBX on a much larger scale?

The funny thing is that looks like the original IP phone that started this whole adventure does not work with Asterisk. Never mind, it will be good for another project.

Categories
Admin

Switched to SPDY and now Google’s confused

Out of interest, I recently switched this site to SPDY, party because I like to try out new things, and partly because I would want to make things be better and faster. So far it’s a mixed experience, with some puzzling changes, that I cannot make heads or tails of.

The first step for the switch was bringing everything onto HTTPS, which I have done with a free SSL certificate from StartSSL. Redirected everything from the HTTP to the secure connection, with the 301 http code so I thought Google will be able to follow it well and replace the addresses in their index. Then enabled the SPDY module in Nginx, and checking the result looked like I was in business.

Some time has passed, and a scary graph started to manifest itself in Google Analytics:

Google Analytics impression count, the site has changed around May 8.
Google Analytics impression count, the site has changed around May 8.

Right after I have made the changes, my impression count on Google dropped like a brick, now being exactly 0. That’s not really the change I wanted to see. Digging more into it, though, it looks like I still have a constant stream of visitors from Google Search:

Visitor numbers from Google Search, same time interval as the impression count.
Visitor numbers from Google Search, same time interval as the impression count.

How can I have zero impressions, but still a half a dozen visitors from Search? The results in the Webmaster Tools mirror things: dropping impression count, no crawl errors, same or even better indexed count, and relatively good stats:

Google Crawler stats, with a big spike when switched over HTTPS/SPDY when needed to reindex everything
Google Crawler stats, with a big spike when switched over HTTPS/SPDY when needed to reindex everything

The crawl seemed to have gotten a bit slower (the bottom plot of the three), but more consistent.

I wonder what could be the change, does the impression count depend on the method of access (http/https)? Or did I made some braking changes? If so, then why’s the conflicting information?

Being a scientist, my main concern is not actually the raw value of any visitor count, but understanding the reactions to my actions, and consistency of the “experimental results”.  I wonder what kind of technique I could use to debug all this?

Update 2013/May/28: 

Following some recommendations from the comments, it looks like that the https:// version of my URL has to added to the Webmaster Tools separately. Now there’s a http://gergely.imreh.net and a https://gergely.imreh.net section as well. In the latter section, I can see that there are some impressions reported. Some weird things still exist: the sum of impressions from both is less than how many visitors I reportedly get from Google Search; the crawl stats is shared between the two sections (ie. the https version reports a lot of crawl stats even from the time there wasn’t https enabled), while most other data is separate for the two sections (e.g. impression, search queries, sitemaps). Still probably this is on the right path.

The impression count after adding a https version of my site's records to the Webmaster  Tools
The impression count after adding a https version of my site’s records to the Webmaster Tools

After the Webmaster Tools changes, I have just switched the Google Analytics association from one WMT property to the other. Hopefully this will freak me out less, though it will likely take some days to see the changes in the result.

Categories
Admin

Fighting forum spam

As one of the managers of Ignite Taipei, I’m trying to come up with new ways to let the community communicate, new ways to share information, advice and all. A while ago I have set up a forum at http://bbs.ignitetaipei.tw/ and I thought that will be an interesting experiment. Well, so far it is useless for communication, but turned out to be a very interesting experience from the sysadmin point of view.

I used FluxBB, because it looked simple enough, seemed to be quite fast (for low traffic volume at least), and well configurable. Except that within a very short time I run into a spam problem, so many fake users registered, and lots of algorithmically generated garbage text with a bit of advertisement here and there.

First I looked into FluxBB’s own solutions, and looks like it might not have been a great choice, because many of the spam-fighting plugins are out of date, or not supported anymore, or just a real pain to set up. The immediate practical step I could take was updating my security questions, roll my own version of “written with words, how much is 5 + 4?”, the regular low-tech captcha on FluxBB. Looks like the original answers are already in the database everywhere, so had to write my own set, which seemed to work for a while, cutting down on red-flagged registrations. But it’s not ideal, since I want to make this a dual-language forum (Ignite Taipei has both English & Chinese as official language).

Instead I turned on email confirmation. When someone registers, the password is sent to their email and have to use that to sign in. It was okay for a tiny bit, then crazy registration boom happened. I think I might be the only one real member of the board (I said that it is a failure so far for communication:) and there are 500 other spam members. Looking at their email addresses, it seems all of them have Hotmail. That kinda suggests a giant failure at Hotmail to restrict automatic registration, which is probably a problem overall. I cannot just throw out Hotmail addresses either, because it’s a popular mail provider here in Taiwan too (my first email was Hotmail too, but that was a looooong time ago, before it was Microsoft property).

So captcha don’t work, email don’t work. What to do instead? At the time I was playing around with Cloudflare, to act as an easy to use CDN. I tried it before for our Ignite Taipei blog, which is hosted on Tumblr, and that doesn’t play well with Cloudflare unfortunately. Couldn’t use it for this blog before because of my DNS provider, but now I switched, so started playing with it again.

The dashboard of the Cloudflare interface
Cloudflare stats snapshot (parts of it)

Instead of enabling Cloudlfare for the entire ignitetaipei.tw domain, just turned it on for the forum, since it’s hosted elsewhere. And that totally did it. Spam stopped that very moment, and haven’t returned since. I think what happens is that Cloudflare knows globally a lot of web/forum/email span hosts, and can challenge them or generally ignore them. Can even see where those spammers are coming from.

List of captured threats on the Cloudflare threats console
Cloudflare Threats Console

One weird (but actually not that surprising) thing is that the most active web crawler on the site (Cloudflare gives that info as well) was Baidu by far, so I guess more people knew about the site in China than elsewhere. Why’s that? Some forums that share vulnerable sites, or something like that? I barely had any Chinese content at that time, so it cannot be that. And since I turned on the threat control part, Baidu seem to have dropped quite a bit (submitted the site to Google so now that’s the busiest crawler).

All in all, Cloudflare is an interesting experiment. I can really mess up my DNS with it, and could blocked my own site for several hours, but in general it worth it. Just have to be careful. For example when testing, use their own name servers to check the information, and maybe instead if “automatic” time-to-live, set some very short time first. I usually use Google’s 8.8.8.8, and they pick up the first wrong setting really quickly, then it takes hours to pick up the correction I made just minutes after the first one.

After a bit of playing around, at least I have no spam anymore (keep fingers crossed). Now just have to get people to use the forums. :)