Life Programming

Hacker learning Chinese

I guess when a hacker learns a language it is different from the way “others” do that. I guess this, because I think I’m a hacker, I’m learning a language and it feels different. I see two main factors coming in the picture:

  1. I’m connecting things in my life, so the things I do usually need some motivation or purpose behind them, without which they are abandoned. The activities I keep up the best are the ones that connect multiple different things.
  2. I want to do things as efficiently as possible. Hack the tools, hack the process to make it better. If there are no tools, make some. If no processes, come up with ideas.

Okay, these are pretty vague expressed like this, now let’s give the example that prompted this post: I’m learning Chinese. Living in Taiwan for two and a half years now, so “high time” doesn’t even start to describe it. Finally I got a tutor, and a good one at that, so twice each week I have a good session of chatting and learning. After each of the sessions we have at least 30 or 40 new words and expressions written down. Those would normally be just forgotten, so I take an effort (about another hour or so after the session) to type them into a Google Document, this very one on the picture:

Google Docs Chinese vocabulary
The Google Document that powers my learning (click to enlarge)

Enter the English expression, the Chinese original and the pronunciation in Bopomofo (which I prefer to Pinyin). This last is possible because of Yahoo Chinese-English Dictionary (one service that is generally better than Google’s own Translate, though I frequently need to use both). The other three columns I’ll came in just recently.

At the moment I’m up to 307 expressions, and that’s just not possible to practice from a spreadsheet like that. I remembered, though, checking out a fellow StartupBus participant, Pamela Fox‘s Google Spreadsheet Flash Cards some time ago. It was fun but followed a different logic than me so couldn’t make complete use of it. But then: why not make a new practice system for myself? This goes back to the precious points: 1) connecting programming and languages – in both I’ll learn something new and they’ll reinforce each other’s motivation, 2) use the exact tools that I need even if I have to make them (because it’s possible to do).

Also after having done Who Said That? (that is currently down due to the AWS fail), I was into guessing games: let’s make a vocabulary guessing practice app that uses the above spreadsheet that I have anyways. What format it should be? Well, for the very first test, to see how to interact with Google Docs and such, just made it as a simple, console-based python app, something like this:

Chinese learning console
The console app to practice me Chinese vocabulary (click to enlarge)

Each practice round has 30 questions, randomized, 4 possible solutions, with the pronunciation as hint. Simple, though very ugly in the inside at the moment (here’s the repo, I still need to write a ReadMe). It works and now I know what to look out for in terms of implementation (how to log in, how to get and update data, stuff like that).

The other 3 columns in the above spreadsheet are explained as well: they are keeping score such as the total number of times a word/expression is practiced, the number of good answers and the number of current good-answers-in-a-row. These provide a rough-and-ready way to diagnose and manage the learning process – until I come up with better ways. Anyway, right now I get about 50%-70% good which is more than I expected, but still a lot more room for improvement.

A list of improvements to the program that I’m thinking about:

  • Making it into a site, so I can use my phone on the go to practice. Also, potentially others can use it as well to practice anything based on any spreadsheet with “one side – other side – hint” structure.
  • I’m pretty sure this whole things could be done in a single Javascript powered page. I don’t know enough Javascript to pull it off yet, but I’ve seen that all the components separately, and that would make a very portable and compact solution.
  • Need to figure out some easier setup of the spreadsheet if this is to be used by others later. I cannot rely on them understanding what they way I was thinking. Maybe in-app option to add more fields?
  • I remember reading somewhere that the most effective practice is that I reduce the frequency of checking words that I know well. That’s where the last column comes in: as one has more right answers in a row, one can tast that word/expression fewer times. If there’s a mistake then go back to the original method and test it more until score builds up again. This can potentially be a very complicated algorithm, I got to think of a way that scales well (ie. it is not too bad compared to an ideal method but does not require extensive amount of calculation). Have some ideas, but they need more polish.
  • After watching Salman Khan’s TED talk this year it grabbed me just how much information is there in one’s actions (they do amazing feedback to teachers on how the students study), how much better you can understand why did people what they did if you have all those diagnostic information (ah, the temptation of Big Data:). To apply this idea of extended diagnostics I could have a logging system instead of keeping (a simple) score. From there the system could get: how much you practice, how are you doing / getting better, what are easier or more difficult words for you, what two words you mistake with one another, suggest things to practice more, suggest words from topics that you know well to extend your knowledge… And more (this was just a little brainstorming). I don’t think I’d have time to extend it like that, and ideas are a dime a dozen, but one never knows…
  • Adding more modes of learning not just multiple choice, multiplayer learning, more game mechanics (achievements, pins anyone? :)
  • If there are central datasets instead of self-provided ones, then the system can anticipate what are the difficult parts from other students’ performance before you.

Now, let’s just see how will my Chinese improve during all this hacking, since that’s the main point, isn’t it? :)

New Formosa Restaurant Signature Dishes
Some motivation for learning, loads of Taiwanese food :)

Ps: If you have any language learning tricks, let me know in the comments!


Building (for) fun

I’ve been doing things very differently since I came back from the Startup Bus. Even before that I was writing up lists of technologies I wanted to try – but all there was to it, lists. Now I know one can indeed create something in a short time, so there’s no excuse to not actually trying all those tech. Here’s a little case study of what I’ve been doing in the last ~two weeks.

I had an idea for a game: I love quotations and reading a lot of Twitter I thought there could be a lot of “clevers” and “funnies” and “insightfuls” in there… I follow a lot of people and it is always interesting to see the different style people talk. So let’s take a bunch of people who others might know, find their tweets that make good quotation and let people guess who that tweet belongs to.

Yeah, that’s the whole thing. And all of it mostly made because 1) I like these kind of puzzles, 2) wanted to learn a bit of web app programming. Going ahead and looking at the result: Here it is, Who Said That? :

Website screenshot
playing with Who Said That. Here, I missed.

To summarize (mostly for my own education), here’s what I learned in the one week of development and in the one week since going live:

Behind the scenes

First I had to choose what technology it should all be based on. The two main contenders were Ruby on Rails and Django. I was reading a lot of pros and cons of both. I know that almost everyone I met on the Bus were doing Rails, and seen it myself too. On the other hand my go-to language to do things is Python. So they were on more-or-less equal footing. In the end I went with Django, because it is completely new to me and can have a little “niche” experience (not because not many people use Django, just not many who I know) that might come handy later. Also, there will be lots of other opportunities to try Rails in other projects. :)

I also wanted something hosted. Since Rails is out, Heroku is out too (that’s what everyone seems to be using, for a good reason, though I also didn’t want to spend money on it if I don’t have to). I also considered Google AppEngine which is not strictly Django but at least Python. I used before (and it is great at least on the small scale), so wanted to see if there’s any other hosting company out there. Fortunately there’s a startup that seems to be just the right thing for me: DotCloud. They are still in beta, but I got somehow an invite by talking to them on Twitter. They host Ruby, Python, Rails, Django, databases (MySQL, PostgreSQL, Redis), and more. One push deploy, roll-back, quite good docs, … and I love startups so why not help them too by testing their platform. :) So there it is: Django on DotCloud.

I also wanted to try a few different techs, like “can I use Redis for something in this project”? But that’s just the wrong way. Instead I should always ask: I want to do this or that: what’s the right tech for it? Using PostgreSQL and Django at the moment, though would think MongoDB would be a good fit too, but DotCloud does not have Mongo yet.

Getting interesting data

First I thought it will be quite easy, because people seemed to like the Twitter API. It was everything but… Now I learned to be much more careful of what I say: an API can be good for a lot of things, but to get exactly what I want can often mean a lot of wrangling, trial and error, and tweaking my own thought process to think in their developers’ way.

What did I want? “Quote-quality tweets”, that is: not a reply, not a retweet, not a link. Popularity (by some measure to be determined) is optional, but would be nice.

How did it go this time:

  1. “I want interesting Tweets from all kinds of folks, from everywhere!” (If you already know web development and my naivete start to make you sniggle at this point, I know, I know…) That looked like a perfect job for statuses/public_timeline. But that only pulls a couple of Tweets in.
  2. Then maybe the Streaming API? Well, I cannot have access to all the tweets (and probably I couldn’t handle of that traffic either). At my level (“Spitzer”) I can only have access to a random 1% of tweets. So I wrote a little app that was listening to the Stream and filtering tweets that looked like a good quote. I left that running the whole night. How many good tweets it collected? I think it was about 3 (yeah, single digit). How many of those were from people who anyone else would be interested? 0. Well, this failed again.
  3. [There was some more trial and error that I don’t remember very well…]
  4. Then had a spark: keep it simple. Instead of fishing in the big pond, let’s choose the interesting people first, and import what they are saying. Well, duh! So it can be even simpler: set up a Twitter user for the sole purpose of administration. Whoever that user follows will be in my database. Can also use Twitter lists to organize them and get some categories out (e.g. someone can be in Celebrity, Actors and Comedy in the same time). This also lends itself to easy administration within a Twitter interface, don’t have to roll my own.

So with this, I just had to figure out who to follow. At this point I mostly imported people who I know and I think they write well or interesting things.

  • WeFollow is a good resource for categories and popular people
  • Twitter’s own Who to follow
  • Using the Similar people link when I added someone new

So far I have about 70-80 people in the database, giving more than 4000 useful quotes. Some people are better than other. I was surprised how many of them just post loads of links but almost no personal content (and these are the things I like to learn in a project like this).

In the end I had something simple working and I enjoyed using. Now let’s make it ready to open up.

Domain Name

This part I didn’t really have to but wanted to: putting it on it’s own domain. I mean the likes of “” is okay, but I wanted to be able to make short links to the game and that format doesn’t really leaves much space e.g. if I make a Tweet with a link…

Been looking for a name for the better half of two days. Lots of searching and getting inspiration from I learned a couple of things. One idea was “Who Said It?”, but .it’s Italy does not let people with not EU addresses to register. GoDaddy goes around this restriction for an extra $19.99/year (they vouch for you or something) but that was just too much for me and I didn’t want to register with them anyways. I do have an EU permanent residence at the moment, but just didn’t want to risk it. There were a couple of other domain names that are restrictive in this sense, and in the end I went with Austria’s .at, for “”. It’s okay, not too expensive (the prices very a lot between countries, before I though there’s a more-or-less common price). One thing I didn’t like about it, that now I’m registered with living in “Taiwan, Province of China”. Say WHAT? (Okay, let’s put politics aside, that’s for some other time).

Making it look something

Well, from the design of the page one thing is easy to tell: I’m not a designer. I know I should make something simple, clean (because that’s what I like to use as well), but not entirely sure how to get around doing it. I was looking for CSS speech bubbles for a long time. It seems that most of the designs use the :before and :after pseudo-element, which is cool, but has its own problems: eg. cannot be manipulated from Javascript – if I wanted to make a speech bubble pointing 4-ways and then deleting 3 once the player guessed, then 1) I don’t know a way to use 4 :before/:after elements, 2) cannot alter them…. Well, I’ll look into this later.

In the end I’m happier with this opacity change, and hiding/unhiding of elements. I’m using jQuery to achieve it, and also for the AJAX calls to get the correct answer. It works well enough until I find a good designer who’s willing to improve on the page for not much in return. Or we’ll see…

My favorite part of this design is the relative robustness: I tried it even in the Kindle 3‘s browser, everything worked very well except for the results’ display by setting .innerHTML… Guess I should look around if there’s another way to do that. I bet there is.

Showing it off

Well, this didn’t work too well so far. I try to put shameless plugs everywhere, but I got altogether about 35 pageviews in the last 10 days…. and I think half of that must be me. :) Hurrah for Google Analytics. Things tried/to try

  • I like the “tweet this puzzle” button, but that got altogether 1 click. Guess people are spammed with linked content already so it does not stand out
  • Set up twitter account for the page. Not even spam-bots noticed it. The hidden management bot account on the other hand got a few “fans” already… :)
  • Set up a blog on Posterous. Got to write better content, so far I have 1 lone through-click. Good thing is that when I post something there, it is shown in their new post directories, so that blog got already much more casual “views” than this blog here. Space to improve
  • Submitting it later to tech sites? Maybe, when I made I finished implementing a few things that I want. Or when I give up implementing them for the moment.
  • ???


Since launch, I keep filing bugs, feature requests and refactor requests in the GitHub issue page of the site (well, just not to forget it). I did fix a couple of things but didn’t make that many new features as I thought. It might very well be because I’m more a “hacker” (i.e. making something work quick) then a developer (making something work well). I need some attitude change in this respect. Even if this is a “just for fun” project, I do have many ideas to improve it beyond the current stage. It would be too bad to let it die out of laziness.


Keep improving. See whether there’s something the main domain ( could be used for. See if I can pivot it to be something more interesting to people. See where did I go wrong with my assumptions that what makes good game mechanics (since I have 0 experience in that field before:).  Also, there are other ideas popping up in my head all the time, so got to think what is better: getting some of the other things done or improve this.

If anyone have any comments, feel free, I take this as an experiment and want to learn as much as possible from it. If you are making things for fun as well, tell me in the comments, I’d love to see! :)