There’s a war out there

Since I have set up my little Virtual Private server about two months ago, I keep reading and learning more about its administration. In particular I’m trying to make it more secure, since nobody likes data lost or their things used behind their back. I know that the Internet is a tough place. Most computer users are nicely isolated behind their routers and internal networks, nevertheless I had my freshly installed WinXP being infected in less then 5 minutes when connected to the Net. (Well, since then I don’t install anything Microsoft and first thing to take care is the security, so things are much better).

Thwarting brute force attacks

One of the first thing is securing the remote login access to the machine: disabling root login for SSH is always a good idea. But since I’m interested in cleverer methods, I wanted to do something more potent and general. Found this blog post about how to limit brute-force attack with iptables, so I set out to implement it. The basic idea is that if another computer is trying to connect too many times in short succession, then it is likely an attack. Use the firewall to see how many connections are made in a specific time interval to the sensitive ports and if a threshold is passed then ban that host from connecting for a while. I like it and had to implement it.

The information on the linked page is quite detailed and very useful. Just save the current iptables rules, edit them, and then restore.

# iptables-save > myrules
.... edit them rules ....
# iptables-restore < myrules

For remote servers one thing to be extra careful about is not to block the SSH connections completely: keep the current connection open, try to make a new connection and if you can log in, then things should be fine.

The only thing I have changed compare to the other site is the log level, so i can separate them better. In the following line there was originally --log-level 7 (debug) I’m using --log-level 4 (warning):
-A ATTACKED -m limit --limit 5/min -j LOG --log-prefix "IPTABLES (Rule ATTACKED): " --log-level 4

Then update the line in /etc/syslog.conf to:
kern.warning   /var/log/warnings

Of course this might vary somewhat from Linux distro to distro: the above is for my CentOS install with syslog,

From the logs

Well, not sure if my host was particularly busy or not – I assume it wasn’t since I don’t rank high in Google so fewer attackers would find my little “home”. Still, in the last month there’s a nice little collection of IP addresses which triggered that ATTACKED rule of the firewall.

Using Python I extracted the IP addresses from the logs, run them through GeoIP Python API to get their locations and fed that into the Google Maps Static API, to get this picture:

Location of hosts that triggered my ATTACKED iptables rules
Location of hosts that triggered my ATTACKED iptables rules. Red: once, blue: 2-9 times, yellow: 10+ times

Altogether in about 1 month, I logged 110 ATTACKED triggers from 47 different hosts. Most of them tries only once, there was one that did 48 times. According to GeoIP database, it is from Varna, Bulgaria. Well, if there is one good thing that came out of this, that Varna actually looks quite good and I’d be interested to visit it. :) Talk about strange my reactions to things…

It seems Europe and China are up to no good. Not sure if American baddies are less or just targeting mostly Americans. Might investigate the regional differences some time later. Though this is just for curiosity and fun, if I was serious, then I could set up a proper honeypot.

Some technical notes on making this picture:

  • GeoIP Python API looks one of the worst documented codes I’ve ever seen. I found a tutorial that helped me to get the results I wanted: cities and locations, not just countries.
  • Static maps are quick, dirty and limited. Will try to figure out use the Google Map API for a proper zoomable, scrollable, annotated map. Could imagine making a heat-map of threats, or better colour-coding of the number of attempts from each IP/City.

Anyways, at least there’s no sign of unauthorized entry so far, since most of these attacks are not sophisticated at all. I wonder if I’d recognize if I ever was targeted by a sophisticated attack, but that’s not something to fret over. Just keep the automated backups going and it will be all fine. :D

Update:

The Python script I used to get that map can be found over here.

New Laptop or You Had Me at “No OS”

I’ve been wanting to upgrade my laptop for quite a while. It was a good ol’ Acer Travelmate 4501wlmi from 2004. I’m not sure why I have kept it for such a long time, maybe I liked torturing myself. In the end the screen was barely hanging on its hinges, the video card memory was corrupt so the screen was all funky sometimes, but what finally did it is the flaky/failing wireless.

Lenovo X201i
Artificially arranged desktop:)

I did check out before what are the acceptable alternatives for a new laptop. Then last weekend I went and got myself a new Lenovo X201i, When I first went to the store, I wasn’t sure whether I’ll get it, or which model to go for. Tried to get some information from the clerk about the available options, but with this communication gap I usually have here in Taiwan, due to my limited Chinese, wasn’t for an advantage. In the end all I did is pretty much confirmed what I have already known: the Lenovo X-series is their smallest ultraportable, they can be quite powerful, and pretty popular. When he asked me what kind of system I wanted and I told him: none, I got a good confirmation that I came to the right place. All other stores the reactions range from apological raised eyebrows to statements that “selling laptops without Windows is illegal” (true story). Here on the other hand, he just got out his “No OS deals” sheet, and I just checked out of the most powerful of them: it had everything I needed and was altogether about 20% cheaper than the other model I was considering before. He was saying that there were only 3 left, so I just galloped off the the nearest ATM, and there I had it, good times.

A few days later I went back to get a few small details sorted out: exchanged to a larger battery (6 to 9 cell), upgraded the memory (2 to 8Gb) and switched the keyboard cover to the right one. This time the limited Chinese was for my advantage. I was talking to a different person this time, who knew even less English than my previous clerk, so whenever the new one was contradicting the deals I was promised, I just had to question it and they gave me the deal, instead of going into any conversation why I couldn’t have it. It’s all fine, I wasn’t abusing this “power”, but not going to be taken advantage of that easily either. All in all, it was quite good deal, even if it would have been cheaper to order it directly from America on the internet.

Experience so far (~5 days):

  • This machine does not compete for any beauty prize, so don’t mind that the 9 cell battery does not improve on that front. It is still okay for me. The matt finish on the cover picks up every touch, so it’s going to be pretty “used” looking soon. The keyboard cover is a good idea, knowing myself, but does not improve things either.
  • It is not really fair to compare it to a computer 6 years its senior, but it’s such a breath of fresh air how snappy it is. Not the most powerful computer I’m using (hard to beat the office’s quad core) but certainly a small powerhouse on the go.
  • The size is just right. Had an EeePC before, and I thought I could get really used to it, but in the end the limitations were just too much. Still got to find a good, small, laptop-enabled backpack, but with its 12″ it shouldn’t be a big deal
  • With the 9-cell I got about 6-7 hours of light use out of it. This is before I did any real power optimization. Linux does have a lot of tricks and even things like sound card power saving can go a long way. Still has to investigate
  • Installed my usual Arch Linux, now with all encrypted filesystem (not that I’m planning to let it be stolen). It will take a while to get my old settings back again, but at least I can organize them better.
  • That ESC key is at some weird place in the corner, keep pressing F1 instead. Even if No OS version (and they saved the “Windows7” sticker) I still have the Windows button. Will try to find some appropriate role for it.
  • Haven’t had a chance to try the WiMAX or the built in camera. The first will probably stay like that, the second I should get going with Skype.
  • Keyboard lighting is ace for nighttime stuff, just like now.
  • The pointing stick does not really like the keyboard cover. It is no big deal, I’m more of a touchpad fan. That touchpad has 5 different buttons but none of them emulates a mouse wheel as far as I can tell. Want to find out what does emulate it, should be very useful. The pad itself acts up sometimes, but nothing too annoying.
  • The 320Gb hard drive is not bad at all, but I’ll look out for a good SSD – should save on power and improve on speed.
  • The screen is a bit picky of the angles it wants to be looked at from. I know the tablet version (X201t) is muc better, this one I just got to live with.
  • Built in fingerprint reader – got to get the drivers working, but it would be awesome to use it for the constant sudo goodness that is required for a well secured system.

Now I have no excuse to be very productive anywhere and everywhere.

Hacker Cup Round 1 Redux

Last night’s Hacker Cup round, which was a second try of the first round after last week’s disaster, didn’t go quite as well for me as I hoped. Probably not that much for the others either, as even if 1000 people will advance, less than that have submitted code.  Can be disappointment due to last week’s failure, I too was hesitating to take part, but also because the problems were I think at least one level up. Which is not a problem, just observation.

by jailman @ Flickr
White Keyboard and Coffee by jailman @ flickr

Got my coffee at 1am, set in to mood and started off right on time at 2am.  In the next 3 hours I finished one of the problems, and got halfway with another one, which is below par, but what can I do. Need to learn more. One thing that I’ve noticed is if one just searched the web with the right keywords, Problem 1 and 3 could be handed over on a silver plate, more or less… Which is kind of random for a programming competition.  Did they want people who can find the right solutions or maybe someone who knows the solution for all of these problems alread? The first one begs the question whether there will be  Wikipedia access in the final. The second one begs the question: who can be such a person? And if neither – why choose such problems that cannot possible be solved in the allotted time without prior knowledge or external input?

Well, I wonder what would be the right keywords for Problem 2, which I could not solve yet.

I’m also bothered, that it seems that the input file one had to download for Problem 3 does not conform to the input specs. Why? Especially when if you download the file, the system gives you 6 minutes to submit. This is rather underhanded….

I will certainly try to check out the other two sub-rounds as well, even if they are 5am next Wednesday and next Sunday local time. Not doing this for the glory but for the education anyway. :)

Notes on solutions

Warning: Spoilers ahead. Don’t read if you want to solve them yourself.

Almost all my code for this can be found in this Github repo, together with other rounds’ code.

1) Wine Tasting

This one is more mathematics than anything. Let’s have G glasses of wine and minimum C guesses to be right. For i = C to  G calculate the number of : how many ways one can have good guesses, multiplied by how many ways none of the remaining guesses are right. Sum these numbers up, and that’s your result.

I was thinking for a long time how to calculate the number of ways none of the guesses are right (i.e. the number of permutations where non of the elements are at their original place). Finally I found there’s a name for that: derangement. And it is not at all simple, so I guess I wouldn’t have figured it out myself.

In the end, what we have to return is:
$$\mathrm{result} = \sum_{i = C}^{G}{G \choose i} (G – i)! \sum_{j = 0}^{(G-i)} \frac{(-1)^{j}}{j!}$$

Now that it actually makes sence, it turned out this very expression has a name too: recontres numbers.

Update: fixed my program and now should be correct.

This can be a very large number due to the factorials, so got to use the right types. In my python code I had to use 64bit numbers, but it’s rather ad-hoc, just choose the ones not to break. I do have to revisit this again and fix the types.  (And don’t forget the modulus).

Also, I used SciPy to get the results, but only needed the factorial. If I quickly write a fast factorial subroutine then SciPy is not needed – another thing to fix in my code.

It turned out that my code has failed, so basically this round is 0 out of 3. Nevertheless, wanted to see what was wrong, so I rewrote most of the math code. After messing around for a while, it turns out that all my problems came from

which is basically the newer, py3k-style integer division code. If I excluded that, suddenly everything was fine. I really have to remember this, because I used this include in practically every math-related code I wrote lately. Besides this change, I kept the other upgrades: fast factorial and choose. My output for the given input file has md5sum of d17dfc9e9fef637f771da3d693d7920b

2) Diversity Number

This one I failed on, and seems very few people actually succeeded in general (less than 60 people in the whole round). I start to understand bits and pieces of the possible solution, but I have nothing complete yet.  I wonder if there’s such an insight into this one as in the previous problem with the derangement.

Anyway, one thing I know: if there’s a list A, sorted, with length n and index starting at 0, then the diversity number is
$$\mathrm{diversity} = \prod_{i=0}^{n-1} \left(A_{i} – i\right)$$
or in Python

Now just have to figure out, what to do, not to enumerate all the $$2^{100} > 10^{30}$$ subsequences in the worse case scenario. :-?

(I’ll update this when I’ve found the solution).

3) Turn on the Lights

This one is pretty much straight out of the game Lights Out, with the only twist that one has to turn on all the lights instead of turning off. This I found only well after the round ended but this makes the puzzle a bit of a duh for me, nevertheless it is worth doing, since there are some optimization choices that are not trivial.

The method I used to get the solution, which follows most of the steps as outlined on the above link. Let’s call the button in row i and column j as [i, j].

  1. While importing the input, turn ever “on” into “off” to get the right setting for this problem.
  2. Turn off every light down to the bottom row. This is done by pressing button at [i, j] if [i-1, j] is on. Save all the buttons that are pressed.
  3. Check the state of last row at that point and keep it as a “target”.
  4. Starting from a blank puzzle (all lights off), turn on a single line in the first row and propagate it down just like in step 2. Save the last row. Do this for all of the lights in the first row, which will create the “base vectors” of the solutions we seek.
  5. We have to find the combination of those base vectors that equals to the above “target”. This can be done (i suppose) with a Gaussian elimination kind of method, but that’s just too much to implement for this little puzzle. According to the specs, the maximum number of colums is 18, thus there are $$2^{18}=262144$$ sett different configurations of lights at max.  That is not too bad, if there’s a fast comparison method, one can do a brute force search. That’s what I did. Also, I made use of binary expression of the light settings, e.g. switching lights can be done with XOR-ing the light pattern and the switch pattern.
  6. If found a match in the previous step then do the swicthing in the first line according to that result and do the propagation again. The final result should be an all-off board. Save all the button presses again.
  7. Check the pressed buttons. Since pressing a button twice is equal to not pressing it at all, if a button is pressed N times it is equivalent to N mod 2. Sum up all those presses, and that should be less than (rows * columns).

This is probably not the quickest method due to the brute force search, but it should be good enough for a programming competition where the speed of programming is more important the the speed of the program.

I do have one big complaint that I mentioned int the intro: the test file downloaded from Facebook did not follow the input specifications. In the specs every puzzle is a single line, the rows of the puzzle separated by whitespace. In the downloaded file for every puzzle every row is a new line.

The md5sum of my solution for the downloaded input file is 1e38422048a6aa9aeb007955d8b66f46